---
title: tidyr study
author: 宇飞的世界
date: '2021-04-30'
slug: tidyr-study
categories:
  - tidyverse
tags:
  - tidyr
---

<script src="{{< blogdown/postref >}}index_files/accessible-code-block/empty-anchor.js"></script>
<style type="text/css">
code.sourceCode > span { display: inline-block; line-height: 1.25; }
code.sourceCode > span { color: inherit; text-decoration: inherit; }
code.sourceCode > span:empty { height: 1.2em; }
.sourceCode { overflow: visible; }
code.sourceCode { white-space: pre; position: relative; }
div.sourceCode { margin: 1em 0; }
pre.sourceCode { margin: 0; }
@media screen {
div.sourceCode { overflow: auto; }
}
@media print {
code.sourceCode { white-space: pre-wrap; }
code.sourceCode > span { text-indent: -5em; padding-left: 5em; }
}
pre.numberSource code
  { counter-reset: source-line 0; }
pre.numberSource code > span
  { position: relative; left: -4em; counter-increment: source-line; }
pre.numberSource code > span > a:first-child::before
  { content: counter(source-line);
    position: relative; left: -1em; text-align: right; vertical-align: baseline;
    border: none; display: inline-block;
    -webkit-touch-callout: none; -webkit-user-select: none;
    -khtml-user-select: none; -moz-user-select: none;
    -ms-user-select: none; user-select: none;
    padding: 0 4px; width: 4em;
    color: #aaaaaa;
  }
pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa;  padding-left: 4px; }
div.sourceCode
  {  background-color: #f8f8f8; }
@media screen {
code.sourceCode > span > a:first-child::before { text-decoration: underline; }
}
code span.al { color: #ef2929; } /* Alert */
code span.an { color: #8f5902; font-weight: bold; font-style: italic; } /* Annotation */
code span.at { color: #c4a000; } /* Attribute */
code span.bn { color: #0000cf; } /* BaseN */
code span.cf { color: #204a87; font-weight: bold; } /* ControlFlow */
code span.ch { color: #4e9a06; } /* Char */
code span.cn { color: #000000; } /* Constant */
code span.co { color: #8f5902; font-style: italic; } /* Comment */
code span.cv { color: #8f5902; font-weight: bold; font-style: italic; } /* CommentVar */
code span.do { color: #8f5902; font-weight: bold; font-style: italic; } /* Documentation */
code span.dt { color: #204a87; } /* DataType */
code span.dv { color: #0000cf; } /* DecVal */
code span.er { color: #a40000; font-weight: bold; } /* Error */
code span.ex { } /* Extension */
code span.fl { color: #0000cf; } /* Float */
code span.fu { color: #000000; } /* Function */
code span.im { } /* Import */
code span.in { color: #8f5902; font-weight: bold; font-style: italic; } /* Information */
code span.kw { color: #204a87; font-weight: bold; } /* Keyword */
code span.op { color: #ce5c00; font-weight: bold; } /* Operator */
code span.ot { color: #8f5902; } /* Other */
code span.pp { color: #8f5902; font-style: italic; } /* Preprocessor */
code span.sc { color: #000000; } /* SpecialChar */
code span.ss { color: #4e9a06; } /* SpecialString */
code span.st { color: #4e9a06; } /* String */
code span.va { color: #000000; } /* Variable */
code span.vs { color: #4e9a06; } /* VerbatimString */
code span.wa { color: #8f5902; font-weight: bold; font-style: italic; } /* Warning */
</style>

<div id="TOC">
<ul>
<li><a href="#前言">前言</a></li>
<li><a href="#安装">安装</a></li>
<li><a href="#主要功能">主要功能</a><ul>
<li><a href="#宽转长">宽转长</a></li>
<li><a href="#长转宽">长转宽</a></li>
<li><a href="#处理jsonhtml的数据">处理json,html的数据</a></li>
<li><a href="#嵌套数据">嵌套数据</a></li>
<li><a href="#嵌套数据和模型">嵌套数据和模型</a></li>
<li><a href="#拆分和合并">拆分和合并</a></li>
<li><a href="#缺失值处理">缺失值处理</a></li>
</ul></li>
</ul>
</div>

<div id="前言" class="section level2">
<h2>前言</h2>
<p><code>tidyr</code>包是<code>tidyverse</code>系列中的核心包,<code>tidyr</code>包的核心目的是整洁数据，有以下特征：</p>
<ul>
<li>每列都是一个变量</li>
<li>每行都是一个记录</li>
<li>每个单元格都是一个值</li>
</ul>
<p>在日常使用数据过程中，这种数据存储方式是一种标准的数据存储方式，像关系型数据中数据的存储。</p>
</div>
<div id="安装" class="section level2">
<h2>安装</h2>
<div class="sourceCode" id="cb1"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb1-1"><a href="#cb1-1"></a><span class="co">## 最简单是的方式就是安装tidyverse</span></span>
<span id="cb1-2"><a href="#cb1-2"></a><span class="kw">install.packages</span>(<span class="st">&#39;tidyverse&#39;</span>)</span>
<span id="cb1-3"><a href="#cb1-3"></a></span>
<span id="cb1-4"><a href="#cb1-4"></a><span class="co">## 或者仅仅安装 tidyr:</span></span>
<span id="cb1-5"><a href="#cb1-5"></a><span class="kw">install.packages</span>(<span class="st">&#39;tidyr&#39;</span>)</span>
<span id="cb1-6"><a href="#cb1-6"></a></span>
<span id="cb1-7"><a href="#cb1-7"></a><span class="co">## 或者从github 安装开发版本</span></span>
<span id="cb1-8"><a href="#cb1-8"></a><span class="co">## install.packages(&quot;devtools&quot;)</span></span>
<span id="cb1-9"><a href="#cb1-9"></a>devtools<span class="op">::</span><span class="kw">install_github</span>(<span class="st">&quot;tidyverse/tidyr&quot;</span>)</span>
<span id="cb1-10"><a href="#cb1-10"></a></span>
<span id="cb1-11"><a href="#cb1-11"></a><span class="co"># CTEST CODE</span></span></code></pre></div>
</div>
<div id="主要功能" class="section level2">
<h2>主要功能</h2>
<div class="sourceCode" id="cb2"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb2-1"><a href="#cb2-1"></a><span class="kw">library</span>(tidyr)</span></code></pre></div>
<p><code>tidyr</code>包中的函数可以分为5个主要大类</p>
<ul>
<li><p><code>pivot_longer()</code> 和 <code>pivot_wider()</code> 宽转长以及长转宽</p></li>
<li><p><code>unnest_longer()</code> 和 <code>unnest_wider()</code>,<code>hoist()</code> 将列表嵌套转化为整洁数据</p></li>
<li><p><code>nest()</code> 数据嵌套</p></li>
<li><p><code>separate()</code>,<code>extract()</code>拆分列,提取新列</p></li>
<li><p><code>replace_na()</code> 缺失值处理</p></li>
</ul>
<div id="宽转长" class="section level3">
<h3>宽转长</h3>
<p>详情查看<code>vignette("pivot")</code>,以下是照搬该图册中的内容</p>
<div id="基础" class="section level4">
<h4>基础</h4>
<p>长数据与宽数据之间的转换，类似我们常用的EXcel中的透视表功能。接下来用<code>tidyr</code>包自带的插图案例记录相关函数用法</p>
<p>在Excel中有时候方便我们肉眼观察，可能一个数据集会有很多列,如下所示：</p>
<table>
<thead>
<tr class="header">
<th>col1</th>
<th>col2</th>
<th>col3</th>
<th>col4</th>
<th>col5</th>
<th>col6</th>
<th>col7</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>v1</td>
<td>v2</td>
<td>v3</td>
<td>v4</td>
<td>v5</td>
<td>v6</td>
<td>v7</td>
</tr>
<tr class="even">
<td>vb1</td>
<td>vb2</td>
<td>vb3</td>
<td>vb4</td>
<td>vb5</td>
<td>vb6</td>
<td>vb7</td>
</tr>
</tbody>
</table>
<p>方便观察，但是不方便统计分析，这是我们需要把数据做处理，从“宽数据变成长数据”即宽转长。</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb3-1"><a href="#cb3-1"></a><span class="kw">library</span>(tidyr)</span>
<span id="cb3-2"><a href="#cb3-2"></a><span class="kw">library</span>(dplyr)</span>
<span id="cb3-3"><a href="#cb3-3"></a><span class="kw">library</span>(readr)</span></code></pre></div>
<div class="sourceCode" id="cb4"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb4-1"><a href="#cb4-1"></a>relig_income <span class="op">%&gt;%</span><span class="st"> </span></span>
<span id="cb4-2"><a href="#cb4-2"></a><span class="st">  </span><span class="kw">pivot_longer</span>(<span class="dt">cols =</span> <span class="op">!</span>religion,<span class="dt">names_to =</span> <span class="st">&#39;income&#39;</span>,<span class="dt">values_to =</span> <span class="st">&quot;count&quot;</span>)</span>
<span id="cb4-3"><a href="#cb4-3"></a><span class="co">## # A tibble: 180 x 3</span></span>
<span id="cb4-4"><a href="#cb4-4"></a><span class="co">##    religion income             count</span></span>
<span id="cb4-5"><a href="#cb4-5"></a><span class="co">##    &lt;chr&gt;    &lt;chr&gt;              &lt;dbl&gt;</span></span>
<span id="cb4-6"><a href="#cb4-6"></a><span class="co">##  1 Agnostic &lt;$10k                 27</span></span>
<span id="cb4-7"><a href="#cb4-7"></a><span class="co">##  2 Agnostic $10-20k               34</span></span>
<span id="cb4-8"><a href="#cb4-8"></a><span class="co">##  3 Agnostic $20-30k               60</span></span>
<span id="cb4-9"><a href="#cb4-9"></a><span class="co">##  4 Agnostic $30-40k               81</span></span>
<span id="cb4-10"><a href="#cb4-10"></a><span class="co">##  5 Agnostic $40-50k               76</span></span>
<span id="cb4-11"><a href="#cb4-11"></a><span class="co">##  6 Agnostic $50-75k              137</span></span>
<span id="cb4-12"><a href="#cb4-12"></a><span class="co">##  7 Agnostic $75-100k             122</span></span>
<span id="cb4-13"><a href="#cb4-13"></a><span class="co">##  8 Agnostic $100-150k            109</span></span>
<span id="cb4-14"><a href="#cb4-14"></a><span class="co">##  9 Agnostic &gt;150k                 84</span></span>
<span id="cb4-15"><a href="#cb4-15"></a><span class="co">## 10 Agnostic Don&#39;t know/refused    96</span></span>
<span id="cb4-16"><a href="#cb4-16"></a><span class="co">## # … with 170 more rows</span></span></code></pre></div>
<ul>
<li>第一个参数是数据集</li>
<li>第二个参数是那些列需要重塑，在该例中除了<code>religion</code>的其他全部列</li>
<li><code>names_to</code>这个参数是新增的列名</li>
<li><code>values_to</code>是新增的存储之前数据集中数据的列名</li>
</ul>
</div>
<div id="列名带数字" class="section level4">
<h4>列名带数字</h4>
<div class="sourceCode" id="cb5"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb5-1"><a href="#cb5-1"></a>billboard <span class="op">%&gt;%</span><span class="st"> </span></span>
<span id="cb5-2"><a href="#cb5-2"></a><span class="st">  </span><span class="kw">pivot_longer</span>(</span>
<span id="cb5-3"><a href="#cb5-3"></a>    <span class="dt">cols =</span> <span class="kw">starts_with</span>(<span class="st">&quot;wk&quot;</span>), </span>
<span id="cb5-4"><a href="#cb5-4"></a>    <span class="dt">names_to =</span> <span class="st">&quot;week&quot;</span>, </span>
<span id="cb5-5"><a href="#cb5-5"></a>    <span class="dt">values_to =</span> <span class="st">&quot;rank&quot;</span>,</span>
<span id="cb5-6"><a href="#cb5-6"></a>    <span class="dt">values_drop_na =</span> <span class="ot">TRUE</span></span>
<span id="cb5-7"><a href="#cb5-7"></a>  )</span>
<span id="cb5-8"><a href="#cb5-8"></a><span class="co">## # A tibble: 5,307 x 5</span></span>
<span id="cb5-9"><a href="#cb5-9"></a><span class="co">##    artist  track                   date.entered week   rank</span></span>
<span id="cb5-10"><a href="#cb5-10"></a><span class="co">##    &lt;chr&gt;   &lt;chr&gt;                   &lt;date&gt;       &lt;chr&gt; &lt;dbl&gt;</span></span>
<span id="cb5-11"><a href="#cb5-11"></a><span class="co">##  1 2 Pac   Baby Don&#39;t Cry (Keep... 2000-02-26   wk1      87</span></span>
<span id="cb5-12"><a href="#cb5-12"></a><span class="co">##  2 2 Pac   Baby Don&#39;t Cry (Keep... 2000-02-26   wk2      82</span></span>
<span id="cb5-13"><a href="#cb5-13"></a><span class="co">##  3 2 Pac   Baby Don&#39;t Cry (Keep... 2000-02-26   wk3      72</span></span>
<span id="cb5-14"><a href="#cb5-14"></a><span class="co">##  4 2 Pac   Baby Don&#39;t Cry (Keep... 2000-02-26   wk4      77</span></span>
<span id="cb5-15"><a href="#cb5-15"></a><span class="co">##  5 2 Pac   Baby Don&#39;t Cry (Keep... 2000-02-26   wk5      87</span></span>
<span id="cb5-16"><a href="#cb5-16"></a><span class="co">##  6 2 Pac   Baby Don&#39;t Cry (Keep... 2000-02-26   wk6      94</span></span>
<span id="cb5-17"><a href="#cb5-17"></a><span class="co">##  7 2 Pac   Baby Don&#39;t Cry (Keep... 2000-02-26   wk7      99</span></span>
<span id="cb5-18"><a href="#cb5-18"></a><span class="co">##  8 2Ge+her The Hardest Part Of ... 2000-09-02   wk1      91</span></span>
<span id="cb5-19"><a href="#cb5-19"></a><span class="co">##  9 2Ge+her The Hardest Part Of ... 2000-09-02   wk2      87</span></span>
<span id="cb5-20"><a href="#cb5-20"></a><span class="co">## 10 2Ge+her The Hardest Part Of ... 2000-09-02   wk3      92</span></span>
<span id="cb5-21"><a href="#cb5-21"></a><span class="co">## # … with 5,297 more rows</span></span></code></pre></div>
<p><code>names_prefix</code> 调整内容前缀，配合<code>names_transform</code>参数使用</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb6-1"><a href="#cb6-1"></a>billboard <span class="op">%&gt;%</span><span class="st"> </span></span>
<span id="cb6-2"><a href="#cb6-2"></a><span class="st">  </span><span class="kw">pivot_longer</span>(</span>
<span id="cb6-3"><a href="#cb6-3"></a>    <span class="dt">cols =</span> <span class="kw">starts_with</span>(<span class="st">&quot;wk&quot;</span>), </span>
<span id="cb6-4"><a href="#cb6-4"></a>    <span class="dt">names_to =</span> <span class="st">&quot;week&quot;</span>, </span>
<span id="cb6-5"><a href="#cb6-5"></a>    <span class="dt">names_prefix =</span> <span class="st">&quot;wk&quot;</span>,</span>
<span id="cb6-6"><a href="#cb6-6"></a>    <span class="dt">names_transform =</span> <span class="kw">list</span>(<span class="dt">week =</span> as.integer),</span>
<span id="cb6-7"><a href="#cb6-7"></a>    <span class="dt">values_to =</span> <span class="st">&quot;rank&quot;</span>,</span>
<span id="cb6-8"><a href="#cb6-8"></a>    <span class="dt">values_drop_na =</span> <span class="ot">TRUE</span>,</span>
<span id="cb6-9"><a href="#cb6-9"></a>  )</span>
<span id="cb6-10"><a href="#cb6-10"></a><span class="co">## # A tibble: 5,307 x 5</span></span>
<span id="cb6-11"><a href="#cb6-11"></a><span class="co">##    artist  track                   date.entered  week  rank</span></span>
<span id="cb6-12"><a href="#cb6-12"></a><span class="co">##    &lt;chr&gt;   &lt;chr&gt;                   &lt;date&gt;       &lt;int&gt; &lt;dbl&gt;</span></span>
<span id="cb6-13"><a href="#cb6-13"></a><span class="co">##  1 2 Pac   Baby Don&#39;t Cry (Keep... 2000-02-26       1    87</span></span>
<span id="cb6-14"><a href="#cb6-14"></a><span class="co">##  2 2 Pac   Baby Don&#39;t Cry (Keep... 2000-02-26       2    82</span></span>
<span id="cb6-15"><a href="#cb6-15"></a><span class="co">##  3 2 Pac   Baby Don&#39;t Cry (Keep... 2000-02-26       3    72</span></span>
<span id="cb6-16"><a href="#cb6-16"></a><span class="co">##  4 2 Pac   Baby Don&#39;t Cry (Keep... 2000-02-26       4    77</span></span>
<span id="cb6-17"><a href="#cb6-17"></a><span class="co">##  5 2 Pac   Baby Don&#39;t Cry (Keep... 2000-02-26       5    87</span></span>
<span id="cb6-18"><a href="#cb6-18"></a><span class="co">##  6 2 Pac   Baby Don&#39;t Cry (Keep... 2000-02-26       6    94</span></span>
<span id="cb6-19"><a href="#cb6-19"></a><span class="co">##  7 2 Pac   Baby Don&#39;t Cry (Keep... 2000-02-26       7    99</span></span>
<span id="cb6-20"><a href="#cb6-20"></a><span class="co">##  8 2Ge+her The Hardest Part Of ... 2000-09-02       1    91</span></span>
<span id="cb6-21"><a href="#cb6-21"></a><span class="co">##  9 2Ge+her The Hardest Part Of ... 2000-09-02       2    87</span></span>
<span id="cb6-22"><a href="#cb6-22"></a><span class="co">## 10 2Ge+her The Hardest Part Of ... 2000-09-02       3    92</span></span>
<span id="cb6-23"><a href="#cb6-23"></a><span class="co">## # … with 5,297 more rows</span></span></code></pre></div>
<p>经过以上转换<code>week</code>列属性变成了整数，当然达到以上效果有其他的途径，如下：</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb7-1"><a href="#cb7-1"></a><span class="kw">library</span>(tidyverse,<span class="dt">warn.conflicts =</span> <span class="ot">TRUE</span>)</span>
<span id="cb7-2"><a href="#cb7-2"></a></span>
<span id="cb7-3"><a href="#cb7-3"></a><span class="co"># method 1</span></span>
<span id="cb7-4"><a href="#cb7-4"></a>billboard <span class="op">%&gt;%</span><span class="st"> </span></span>
<span id="cb7-5"><a href="#cb7-5"></a><span class="st">  </span><span class="kw">pivot_longer</span>(</span>
<span id="cb7-6"><a href="#cb7-6"></a>    <span class="dt">cols =</span> <span class="kw">starts_with</span>(<span class="st">&quot;wk&quot;</span>), </span>
<span id="cb7-7"><a href="#cb7-7"></a>    <span class="dt">names_to =</span> <span class="st">&quot;week&quot;</span>, </span>
<span id="cb7-8"><a href="#cb7-8"></a>    <span class="dt">names_transform =</span> <span class="kw">list</span>(<span class="dt">week =</span> readr<span class="op">::</span>parse_number),</span>
<span id="cb7-9"><a href="#cb7-9"></a>    <span class="dt">values_to =</span> <span class="st">&quot;rank&quot;</span>,</span>
<span id="cb7-10"><a href="#cb7-10"></a>    <span class="dt">values_drop_na =</span> <span class="ot">TRUE</span>,</span>
<span id="cb7-11"><a href="#cb7-11"></a>)</span>
<span id="cb7-12"><a href="#cb7-12"></a></span>
<span id="cb7-13"><a href="#cb7-13"></a><span class="co"># method 2</span></span>
<span id="cb7-14"><a href="#cb7-14"></a>billboard <span class="op">%&gt;%</span></span>
<span id="cb7-15"><a href="#cb7-15"></a><span class="st">  </span><span class="kw">pivot_longer</span>(</span>
<span id="cb7-16"><a href="#cb7-16"></a>    <span class="dt">cols =</span> <span class="kw">starts_with</span>(<span class="st">&quot;wk&quot;</span>),</span>
<span id="cb7-17"><a href="#cb7-17"></a>    <span class="dt">names_to =</span> <span class="st">&quot;week&quot;</span>,</span>
<span id="cb7-18"><a href="#cb7-18"></a>    <span class="dt">values_to =</span> <span class="st">&quot;rank&quot;</span>,</span>
<span id="cb7-19"><a href="#cb7-19"></a>    <span class="dt">values_drop_na =</span> <span class="ot">TRUE</span>,</span>
<span id="cb7-20"><a href="#cb7-20"></a>  ) <span class="op">%&gt;%</span></span>
<span id="cb7-21"><a href="#cb7-21"></a><span class="st">  </span><span class="kw">mutate</span>(<span class="dt">week =</span> <span class="kw">str_remove</span>(week, <span class="st">&quot;wk&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">as.integer</span>())</span></code></pre></div>
</div>
<div id="多变量列名" class="section level4">
<h4>多变量列名</h4>
<p>该案列设计比较复杂的正则表达式,<code>new_?(.*)_(.)(.*)</code>需要一定正则表达式基础。
<code>new_?</code>表示匹配<code>new</code>或<code>new_</code>，<code>(.*)</code>匹配任意0次或多次任意字符。</p>
<p><a href="https://www.runoob.com/regexp/regexp-syntax.html">正则表达式介绍</a></p>
<div class="sourceCode" id="cb8"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb8-1"><a href="#cb8-1"></a>who <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">pivot_longer</span>(</span>
<span id="cb8-2"><a href="#cb8-2"></a>  <span class="dt">cols =</span> new_sp_m014<span class="op">:</span>newrel_f65,</span>
<span id="cb8-3"><a href="#cb8-3"></a>  <span class="dt">names_to =</span> <span class="kw">c</span>(<span class="st">&quot;diagnosis&quot;</span>, <span class="st">&quot;gender&quot;</span>, <span class="st">&quot;age&quot;</span>), </span>
<span id="cb8-4"><a href="#cb8-4"></a>  <span class="dt">names_pattern =</span> <span class="st">&quot;new_?(.*)_(.)(.*)&quot;</span>,</span>
<span id="cb8-5"><a href="#cb8-5"></a>  <span class="dt">values_to =</span> <span class="st">&quot;count&quot;</span></span>
<span id="cb8-6"><a href="#cb8-6"></a>)</span>
<span id="cb8-7"><a href="#cb8-7"></a><span class="co">## # A tibble: 405,440 x 8</span></span>
<span id="cb8-8"><a href="#cb8-8"></a><span class="co">##    country     iso2  iso3   year diagnosis gender age   count</span></span>
<span id="cb8-9"><a href="#cb8-9"></a><span class="co">##    &lt;chr&gt;       &lt;chr&gt; &lt;chr&gt; &lt;int&gt; &lt;chr&gt;     &lt;chr&gt;  &lt;chr&gt; &lt;int&gt;</span></span>
<span id="cb8-10"><a href="#cb8-10"></a><span class="co">##  1 Afghanistan AF    AFG    1980 sp        m      014      NA</span></span>
<span id="cb8-11"><a href="#cb8-11"></a><span class="co">##  2 Afghanistan AF    AFG    1980 sp        m      1524     NA</span></span>
<span id="cb8-12"><a href="#cb8-12"></a><span class="co">##  3 Afghanistan AF    AFG    1980 sp        m      2534     NA</span></span>
<span id="cb8-13"><a href="#cb8-13"></a><span class="co">##  4 Afghanistan AF    AFG    1980 sp        m      3544     NA</span></span>
<span id="cb8-14"><a href="#cb8-14"></a><span class="co">##  5 Afghanistan AF    AFG    1980 sp        m      4554     NA</span></span>
<span id="cb8-15"><a href="#cb8-15"></a><span class="co">##  6 Afghanistan AF    AFG    1980 sp        m      5564     NA</span></span>
<span id="cb8-16"><a href="#cb8-16"></a><span class="co">##  7 Afghanistan AF    AFG    1980 sp        m      65       NA</span></span>
<span id="cb8-17"><a href="#cb8-17"></a><span class="co">##  8 Afghanistan AF    AFG    1980 sp        f      014      NA</span></span>
<span id="cb8-18"><a href="#cb8-18"></a><span class="co">##  9 Afghanistan AF    AFG    1980 sp        f      1524     NA</span></span>
<span id="cb8-19"><a href="#cb8-19"></a><span class="co">## 10 Afghanistan AF    AFG    1980 sp        f      2534     NA</span></span>
<span id="cb8-20"><a href="#cb8-20"></a><span class="co">## # … with 405,430 more rows</span></span></code></pre></div>
<p>进一步处理列<code>gender</code>，<code>age</code> 。</p>
<div class="sourceCode" id="cb9"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb9-1"><a href="#cb9-1"></a>who <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">pivot_longer</span>(</span>
<span id="cb9-2"><a href="#cb9-2"></a>  <span class="dt">cols =</span> new_sp_m014<span class="op">:</span>newrel_f65,</span>
<span id="cb9-3"><a href="#cb9-3"></a>  <span class="dt">names_to =</span> <span class="kw">c</span>(<span class="st">&quot;diagnosis&quot;</span>, <span class="st">&quot;gender&quot;</span>, <span class="st">&quot;age&quot;</span>), </span>
<span id="cb9-4"><a href="#cb9-4"></a>  <span class="dt">names_pattern =</span> <span class="st">&quot;new_?(.*)_(.)(.*)&quot;</span>,</span>
<span id="cb9-5"><a href="#cb9-5"></a>  <span class="dt">names_transform =</span> <span class="kw">list</span>(</span>
<span id="cb9-6"><a href="#cb9-6"></a>    <span class="dt">gender =</span> <span class="op">~</span><span class="st"> </span>readr<span class="op">::</span><span class="kw">parse_factor</span>(.x, <span class="dt">levels =</span> <span class="kw">c</span>(<span class="st">&quot;f&quot;</span>, <span class="st">&quot;m&quot;</span>)),</span>
<span id="cb9-7"><a href="#cb9-7"></a>    <span class="dt">age =</span> <span class="op">~</span><span class="st"> </span>readr<span class="op">::</span><span class="kw">parse_factor</span>(</span>
<span id="cb9-8"><a href="#cb9-8"></a>      .x,</span>
<span id="cb9-9"><a href="#cb9-9"></a>      <span class="dt">levels =</span> <span class="kw">c</span>(<span class="st">&quot;014&quot;</span>, <span class="st">&quot;1524&quot;</span>, <span class="st">&quot;2534&quot;</span>, <span class="st">&quot;3544&quot;</span>, <span class="st">&quot;4554&quot;</span>, <span class="st">&quot;5564&quot;</span>, <span class="st">&quot;65&quot;</span>), </span>
<span id="cb9-10"><a href="#cb9-10"></a>      <span class="dt">ordered =</span> <span class="ot">TRUE</span></span>
<span id="cb9-11"><a href="#cb9-11"></a>    )</span>
<span id="cb9-12"><a href="#cb9-12"></a>  ),</span>
<span id="cb9-13"><a href="#cb9-13"></a>  <span class="dt">values_to =</span> <span class="st">&quot;count&quot;</span>,</span>
<span id="cb9-14"><a href="#cb9-14"></a>)</span>
<span id="cb9-15"><a href="#cb9-15"></a><span class="co">## # A tibble: 405,440 x 8</span></span>
<span id="cb9-16"><a href="#cb9-16"></a><span class="co">##    country     iso2  iso3   year diagnosis gender age   count</span></span>
<span id="cb9-17"><a href="#cb9-17"></a><span class="co">##    &lt;chr&gt;       &lt;chr&gt; &lt;chr&gt; &lt;int&gt; &lt;chr&gt;     &lt;fct&gt;  &lt;ord&gt; &lt;int&gt;</span></span>
<span id="cb9-18"><a href="#cb9-18"></a><span class="co">##  1 Afghanistan AF    AFG    1980 sp        m      014      NA</span></span>
<span id="cb9-19"><a href="#cb9-19"></a><span class="co">##  2 Afghanistan AF    AFG    1980 sp        m      1524     NA</span></span>
<span id="cb9-20"><a href="#cb9-20"></a><span class="co">##  3 Afghanistan AF    AFG    1980 sp        m      2534     NA</span></span>
<span id="cb9-21"><a href="#cb9-21"></a><span class="co">##  4 Afghanistan AF    AFG    1980 sp        m      3544     NA</span></span>
<span id="cb9-22"><a href="#cb9-22"></a><span class="co">##  5 Afghanistan AF    AFG    1980 sp        m      4554     NA</span></span>
<span id="cb9-23"><a href="#cb9-23"></a><span class="co">##  6 Afghanistan AF    AFG    1980 sp        m      5564     NA</span></span>
<span id="cb9-24"><a href="#cb9-24"></a><span class="co">##  7 Afghanistan AF    AFG    1980 sp        m      65       NA</span></span>
<span id="cb9-25"><a href="#cb9-25"></a><span class="co">##  8 Afghanistan AF    AFG    1980 sp        f      014      NA</span></span>
<span id="cb9-26"><a href="#cb9-26"></a><span class="co">##  9 Afghanistan AF    AFG    1980 sp        f      1524     NA</span></span>
<span id="cb9-27"><a href="#cb9-27"></a><span class="co">## 10 Afghanistan AF    AFG    1980 sp        f      2534     NA</span></span>
<span id="cb9-28"><a href="#cb9-28"></a><span class="co">## # … with 405,430 more rows</span></span></code></pre></div>
</div>
<div id="一行多观测值" class="section level4">
<h4>一行多观测值</h4>
<div class="sourceCode" id="cb10"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb10-1"><a href="#cb10-1"></a>family &lt;-<span class="st"> </span><span class="kw">tribble</span>(</span>
<span id="cb10-2"><a href="#cb10-2"></a>  <span class="op">~</span>family, <span class="op">~</span>dob_child1, <span class="op">~</span>dob_child2, <span class="op">~</span>gender_child1, <span class="op">~</span>gender_child2,</span>
<span id="cb10-3"><a href="#cb10-3"></a>  1L, <span class="st">&quot;1998-11-26&quot;</span>, <span class="st">&quot;2000-01-29&quot;</span>, 1L, 2L,</span>
<span id="cb10-4"><a href="#cb10-4"></a>  2L, <span class="st">&quot;1996-06-22&quot;</span>, <span class="ot">NA</span>, 2L, <span class="ot">NA</span>,</span>
<span id="cb10-5"><a href="#cb10-5"></a>  3L, <span class="st">&quot;2002-07-11&quot;</span>, <span class="st">&quot;2004-04-05&quot;</span>, 2L, 2L,</span>
<span id="cb10-6"><a href="#cb10-6"></a>  4L, <span class="st">&quot;2004-10-10&quot;</span>, <span class="st">&quot;2009-08-27&quot;</span>, 1L, 1L,</span>
<span id="cb10-7"><a href="#cb10-7"></a>  5L, <span class="st">&quot;2000-12-05&quot;</span>, <span class="st">&quot;2005-02-28&quot;</span>, 2L, 1L,</span>
<span id="cb10-8"><a href="#cb10-8"></a>)</span>
<span id="cb10-9"><a href="#cb10-9"></a>family &lt;-<span class="st"> </span>family <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">mutate_at</span>(<span class="kw">vars</span>(<span class="kw">starts_with</span>(<span class="st">&quot;dob&quot;</span>)), parse_date)</span>
<span id="cb10-10"><a href="#cb10-10"></a>family</span>
<span id="cb10-11"><a href="#cb10-11"></a><span class="co">## # A tibble: 5 x 5</span></span>
<span id="cb10-12"><a href="#cb10-12"></a><span class="co">##   family dob_child1 dob_child2 gender_child1 gender_child2</span></span>
<span id="cb10-13"><a href="#cb10-13"></a><span class="co">##    &lt;int&gt; &lt;date&gt;     &lt;date&gt;             &lt;int&gt;         &lt;int&gt;</span></span>
<span id="cb10-14"><a href="#cb10-14"></a><span class="co">## 1      1 1998-11-26 2000-01-29             1             2</span></span>
<span id="cb10-15"><a href="#cb10-15"></a><span class="co">## 2      2 1996-06-22 NA                     2            NA</span></span>
<span id="cb10-16"><a href="#cb10-16"></a><span class="co">## 3      3 2002-07-11 2004-04-05             2             2</span></span>
<span id="cb10-17"><a href="#cb10-17"></a><span class="co">## 4      4 2004-10-10 2009-08-27             1             1</span></span>
<span id="cb10-18"><a href="#cb10-18"></a><span class="co">## 5      5 2000-12-05 2005-02-28             2             1</span></span></code></pre></div>
<div class="sourceCode" id="cb11"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb11-1"><a href="#cb11-1"></a></span>
<span id="cb11-2"><a href="#cb11-2"></a>family <span class="op">%&gt;%</span><span class="st"> </span></span>
<span id="cb11-3"><a href="#cb11-3"></a><span class="st">  </span><span class="kw">pivot_longer</span>(</span>
<span id="cb11-4"><a href="#cb11-4"></a>    <span class="op">!</span>family, </span>
<span id="cb11-5"><a href="#cb11-5"></a>    <span class="dt">names_to =</span> <span class="kw">c</span>(<span class="st">&quot;.value&quot;</span>, <span class="st">&quot;child&quot;</span>), </span>
<span id="cb11-6"><a href="#cb11-6"></a>    <span class="dt">names_sep =</span> <span class="st">&quot;_&quot;</span>, </span>
<span id="cb11-7"><a href="#cb11-7"></a>    <span class="dt">values_drop_na =</span> <span class="ot">TRUE</span></span>
<span id="cb11-8"><a href="#cb11-8"></a>  )</span>
<span id="cb11-9"><a href="#cb11-9"></a><span class="co">## # A tibble: 9 x 4</span></span>
<span id="cb11-10"><a href="#cb11-10"></a><span class="co">##   family child  dob        gender</span></span>
<span id="cb11-11"><a href="#cb11-11"></a><span class="co">##    &lt;int&gt; &lt;chr&gt;  &lt;date&gt;      &lt;int&gt;</span></span>
<span id="cb11-12"><a href="#cb11-12"></a><span class="co">## 1      1 child1 1998-11-26      1</span></span>
<span id="cb11-13"><a href="#cb11-13"></a><span class="co">## 2      1 child2 2000-01-29      2</span></span>
<span id="cb11-14"><a href="#cb11-14"></a><span class="co">## 3      2 child1 1996-06-22      2</span></span>
<span id="cb11-15"><a href="#cb11-15"></a><span class="co">## 4      3 child1 2002-07-11      2</span></span>
<span id="cb11-16"><a href="#cb11-16"></a><span class="co">## 5      3 child2 2004-04-05      2</span></span>
<span id="cb11-17"><a href="#cb11-17"></a><span class="co">## 6      4 child1 2004-10-10      1</span></span>
<span id="cb11-18"><a href="#cb11-18"></a><span class="co">## 7      4 child2 2009-08-27      1</span></span>
<span id="cb11-19"><a href="#cb11-19"></a><span class="co">## 8      5 child1 2000-12-05      2</span></span>
<span id="cb11-20"><a href="#cb11-20"></a><span class="co">## 9      5 child2 2005-02-28      1</span></span></code></pre></div>
<div class="sourceCode" id="cb12"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb12-1"><a href="#cb12-1"></a>anscombe <span class="op">%&gt;%</span><span class="st"> </span></span>
<span id="cb12-2"><a href="#cb12-2"></a><span class="st">  </span><span class="kw">pivot_longer</span>(<span class="kw">everything</span>(), </span>
<span id="cb12-3"><a href="#cb12-3"></a>    <span class="dt">names_to =</span> <span class="kw">c</span>(<span class="st">&quot;.value&quot;</span>, <span class="st">&quot;set&quot;</span>), </span>
<span id="cb12-4"><a href="#cb12-4"></a>    <span class="dt">names_pattern =</span> <span class="st">&quot;(.)(.)&quot;</span></span>
<span id="cb12-5"><a href="#cb12-5"></a>  ) <span class="op">%&gt;%</span><span class="st"> </span></span>
<span id="cb12-6"><a href="#cb12-6"></a><span class="st">  </span><span class="kw">arrange</span>(set)</span>
<span id="cb12-7"><a href="#cb12-7"></a><span class="co">## # A tibble: 44 x 3</span></span>
<span id="cb12-8"><a href="#cb12-8"></a><span class="co">##    set       x     y</span></span>
<span id="cb12-9"><a href="#cb12-9"></a><span class="co">##    &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt;</span></span>
<span id="cb12-10"><a href="#cb12-10"></a><span class="co">##  1 1        10  8.04</span></span>
<span id="cb12-11"><a href="#cb12-11"></a><span class="co">##  2 1         8  6.95</span></span>
<span id="cb12-12"><a href="#cb12-12"></a><span class="co">##  3 1        13  7.58</span></span>
<span id="cb12-13"><a href="#cb12-13"></a><span class="co">##  4 1         9  8.81</span></span>
<span id="cb12-14"><a href="#cb12-14"></a><span class="co">##  5 1        11  8.33</span></span>
<span id="cb12-15"><a href="#cb12-15"></a><span class="co">##  6 1        14  9.96</span></span>
<span id="cb12-16"><a href="#cb12-16"></a><span class="co">##  7 1         6  7.24</span></span>
<span id="cb12-17"><a href="#cb12-17"></a><span class="co">##  8 1         4  4.26</span></span>
<span id="cb12-18"><a href="#cb12-18"></a><span class="co">##  9 1        12 10.8 </span></span>
<span id="cb12-19"><a href="#cb12-19"></a><span class="co">## 10 1         7  4.82</span></span>
<span id="cb12-20"><a href="#cb12-20"></a><span class="co">## # … with 34 more rows</span></span></code></pre></div>
<div class="sourceCode" id="cb13"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb13-1"><a href="#cb13-1"></a>pnl &lt;-<span class="st"> </span><span class="kw">tibble</span>(</span>
<span id="cb13-2"><a href="#cb13-2"></a>  <span class="dt">x =</span> <span class="dv">1</span><span class="op">:</span><span class="dv">4</span>,</span>
<span id="cb13-3"><a href="#cb13-3"></a>  <span class="dt">a =</span> <span class="kw">c</span>(<span class="dv">1</span>, <span class="dv">1</span>,<span class="dv">0</span>, <span class="dv">0</span>),</span>
<span id="cb13-4"><a href="#cb13-4"></a>  <span class="dt">b =</span> <span class="kw">c</span>(<span class="dv">0</span>, <span class="dv">1</span>, <span class="dv">1</span>, <span class="dv">1</span>),</span>
<span id="cb13-5"><a href="#cb13-5"></a>  <span class="dt">y1 =</span> <span class="kw">rnorm</span>(<span class="dv">4</span>),</span>
<span id="cb13-6"><a href="#cb13-6"></a>  <span class="dt">y2 =</span> <span class="kw">rnorm</span>(<span class="dv">4</span>),</span>
<span id="cb13-7"><a href="#cb13-7"></a>  <span class="dt">z1 =</span> <span class="kw">rep</span>(<span class="dv">3</span>, <span class="dv">4</span>),</span>
<span id="cb13-8"><a href="#cb13-8"></a>  <span class="dt">z2 =</span> <span class="kw">rep</span>(<span class="op">-</span><span class="dv">2</span>, <span class="dv">4</span>),</span>
<span id="cb13-9"><a href="#cb13-9"></a>)</span>
<span id="cb13-10"><a href="#cb13-10"></a></span>
<span id="cb13-11"><a href="#cb13-11"></a>pnl <span class="op">%&gt;%</span><span class="st"> </span></span>
<span id="cb13-12"><a href="#cb13-12"></a><span class="st">  </span><span class="kw">pivot_longer</span>(</span>
<span id="cb13-13"><a href="#cb13-13"></a>    <span class="op">!</span><span class="kw">c</span>(x, a, b), </span>
<span id="cb13-14"><a href="#cb13-14"></a>    <span class="dt">names_to =</span> <span class="kw">c</span>(<span class="st">&quot;.value&quot;</span>, <span class="st">&quot;time&quot;</span>), </span>
<span id="cb13-15"><a href="#cb13-15"></a>    <span class="dt">names_pattern =</span> <span class="st">&quot;(.)(.)&quot;</span></span>
<span id="cb13-16"><a href="#cb13-16"></a>  )</span>
<span id="cb13-17"><a href="#cb13-17"></a><span class="co">## # A tibble: 8 x 6</span></span>
<span id="cb13-18"><a href="#cb13-18"></a><span class="co">##       x     a     b time        y     z</span></span>
<span id="cb13-19"><a href="#cb13-19"></a><span class="co">##   &lt;int&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt;   &lt;dbl&gt; &lt;dbl&gt;</span></span>
<span id="cb13-20"><a href="#cb13-20"></a><span class="co">## 1     1     1     0 1      0.0766     3</span></span>
<span id="cb13-21"><a href="#cb13-21"></a><span class="co">## 2     1     1     0 2      1.47      -2</span></span>
<span id="cb13-22"><a href="#cb13-22"></a><span class="co">## 3     2     1     1 1     -0.0282     3</span></span>
<span id="cb13-23"><a href="#cb13-23"></a><span class="co">## 4     2     1     1 2      1.36      -2</span></span>
<span id="cb13-24"><a href="#cb13-24"></a><span class="co">## 5     3     0     1 1     -1.10       3</span></span>
<span id="cb13-25"><a href="#cb13-25"></a><span class="co">## 6     3     0     1 2      0.498     -2</span></span>
<span id="cb13-26"><a href="#cb13-26"></a><span class="co">## 7     4     0     1 1     -2.42       3</span></span>
<span id="cb13-27"><a href="#cb13-27"></a><span class="co">## 8     4     0     1 2     -0.705     -2</span></span></code></pre></div>
</div>
<div id="重复列名" class="section level4">
<h4>重复列名</h4>
<div class="sourceCode" id="cb14"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb14-1"><a href="#cb14-1"></a>df &lt;-<span class="st"> </span><span class="kw">tibble</span>(<span class="dt">id =</span> <span class="dv">1</span><span class="op">:</span><span class="dv">3</span>, <span class="dt">y =</span> <span class="dv">4</span><span class="op">:</span><span class="dv">6</span>, <span class="dt">y =</span> <span class="dv">5</span><span class="op">:</span><span class="dv">7</span>, <span class="dt">y =</span> <span class="dv">7</span><span class="op">:</span><span class="dv">9</span>, <span class="dt">.name_repair =</span> <span class="st">&quot;minimal&quot;</span>)</span>
<span id="cb14-2"><a href="#cb14-2"></a>df <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">pivot_longer</span>(<span class="op">!</span>id, <span class="dt">names_to =</span> <span class="st">&quot;name&quot;</span>, <span class="dt">values_to =</span> <span class="st">&quot;value&quot;</span>)</span>
<span id="cb14-3"><a href="#cb14-3"></a><span class="co">## # A tibble: 9 x 3</span></span>
<span id="cb14-4"><a href="#cb14-4"></a><span class="co">##      id name  value</span></span>
<span id="cb14-5"><a href="#cb14-5"></a><span class="co">##   &lt;int&gt; &lt;chr&gt; &lt;int&gt;</span></span>
<span id="cb14-6"><a href="#cb14-6"></a><span class="co">## 1     1 y         4</span></span>
<span id="cb14-7"><a href="#cb14-7"></a><span class="co">## 2     1 y         5</span></span>
<span id="cb14-8"><a href="#cb14-8"></a><span class="co">## 3     1 y         7</span></span>
<span id="cb14-9"><a href="#cb14-9"></a><span class="co">## 4     2 y         5</span></span>
<span id="cb14-10"><a href="#cb14-10"></a><span class="co">## 5     2 y         6</span></span>
<span id="cb14-11"><a href="#cb14-11"></a><span class="co">## 6     2 y         8</span></span>
<span id="cb14-12"><a href="#cb14-12"></a><span class="co">## 7     3 y         6</span></span>
<span id="cb14-13"><a href="#cb14-13"></a><span class="co">## 8     3 y         7</span></span>
<span id="cb14-14"><a href="#cb14-14"></a><span class="co">## 9     3 y         9</span></span></code></pre></div>
</div>
</div>
<div id="长转宽" class="section level3">
<h3>长转宽</h3>
<p><code>pivot_wider()</code>功能与<code>pivot_longer()</code>相反。通过增加列数减少行数使数据集变得更宽，通常我们在汇总时候使用，达到类似Excel透视表结果。</p>
<div id="基础-1" class="section level4">
<h4>基础</h4>
<div class="sourceCode" id="cb15"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb15-1"><a href="#cb15-1"></a>fish_encounters <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">pivot_wider</span>(<span class="dt">names_from =</span> station, <span class="dt">values_from =</span> seen)</span>
<span id="cb15-2"><a href="#cb15-2"></a><span class="co">## # A tibble: 19 x 12</span></span>
<span id="cb15-3"><a href="#cb15-3"></a><span class="co">##    fish  Release I80_1 Lisbon  Rstr Base_TD   BCE   BCW  BCE2  BCW2   MAE   MAW</span></span>
<span id="cb15-4"><a href="#cb15-4"></a><span class="co">##    &lt;fct&gt;   &lt;int&gt; &lt;int&gt;  &lt;int&gt; &lt;int&gt;   &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt;</span></span>
<span id="cb15-5"><a href="#cb15-5"></a><span class="co">##  1 4842        1     1      1     1       1     1     1     1     1     1     1</span></span>
<span id="cb15-6"><a href="#cb15-6"></a><span class="co">##  2 4843        1     1      1     1       1     1     1     1     1     1     1</span></span>
<span id="cb15-7"><a href="#cb15-7"></a><span class="co">##  3 4844        1     1      1     1       1     1     1     1     1     1     1</span></span>
<span id="cb15-8"><a href="#cb15-8"></a><span class="co">##  4 4845        1     1      1     1       1    NA    NA    NA    NA    NA    NA</span></span>
<span id="cb15-9"><a href="#cb15-9"></a><span class="co">##  5 4847        1     1      1    NA      NA    NA    NA    NA    NA    NA    NA</span></span>
<span id="cb15-10"><a href="#cb15-10"></a><span class="co">##  6 4848        1     1      1     1      NA    NA    NA    NA    NA    NA    NA</span></span>
<span id="cb15-11"><a href="#cb15-11"></a><span class="co">##  7 4849        1     1     NA    NA      NA    NA    NA    NA    NA    NA    NA</span></span>
<span id="cb15-12"><a href="#cb15-12"></a><span class="co">##  8 4850        1     1     NA     1       1     1     1    NA    NA    NA    NA</span></span>
<span id="cb15-13"><a href="#cb15-13"></a><span class="co">##  9 4851        1     1     NA    NA      NA    NA    NA    NA    NA    NA    NA</span></span>
<span id="cb15-14"><a href="#cb15-14"></a><span class="co">## 10 4854        1     1     NA    NA      NA    NA    NA    NA    NA    NA    NA</span></span>
<span id="cb15-15"><a href="#cb15-15"></a><span class="co">## 11 4855        1     1      1     1       1    NA    NA    NA    NA    NA    NA</span></span>
<span id="cb15-16"><a href="#cb15-16"></a><span class="co">## 12 4857        1     1      1     1       1     1     1     1     1    NA    NA</span></span>
<span id="cb15-17"><a href="#cb15-17"></a><span class="co">## 13 4858        1     1      1     1       1     1     1     1     1     1     1</span></span>
<span id="cb15-18"><a href="#cb15-18"></a><span class="co">## 14 4859        1     1      1     1       1    NA    NA    NA    NA    NA    NA</span></span>
<span id="cb15-19"><a href="#cb15-19"></a><span class="co">## 15 4861        1     1      1     1       1     1     1     1     1     1     1</span></span>
<span id="cb15-20"><a href="#cb15-20"></a><span class="co">## 16 4862        1     1      1     1       1     1     1     1     1    NA    NA</span></span>
<span id="cb15-21"><a href="#cb15-21"></a><span class="co">## 17 4863        1     1     NA    NA      NA    NA    NA    NA    NA    NA    NA</span></span>
<span id="cb15-22"><a href="#cb15-22"></a><span class="co">## 18 4864        1     1     NA    NA      NA    NA    NA    NA    NA    NA    NA</span></span>
<span id="cb15-23"><a href="#cb15-23"></a><span class="co">## 19 4865        1     1      1    NA      NA    NA    NA    NA    NA    NA    NA</span></span></code></pre></div>
<p>缺失值填充</p>
<div class="sourceCode" id="cb16"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb16-1"><a href="#cb16-1"></a>fish_encounters <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">pivot_wider</span>(</span>
<span id="cb16-2"><a href="#cb16-2"></a>  <span class="dt">names_from =</span> station, </span>
<span id="cb16-3"><a href="#cb16-3"></a>  <span class="dt">values_from =</span> seen,</span>
<span id="cb16-4"><a href="#cb16-4"></a>  <span class="dt">values_fill =</span> <span class="dv">0</span></span>
<span id="cb16-5"><a href="#cb16-5"></a>)</span>
<span id="cb16-6"><a href="#cb16-6"></a><span class="co">## # A tibble: 19 x 12</span></span>
<span id="cb16-7"><a href="#cb16-7"></a><span class="co">##    fish  Release I80_1 Lisbon  Rstr Base_TD   BCE   BCW  BCE2  BCW2   MAE   MAW</span></span>
<span id="cb16-8"><a href="#cb16-8"></a><span class="co">##    &lt;fct&gt;   &lt;int&gt; &lt;int&gt;  &lt;int&gt; &lt;int&gt;   &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt;</span></span>
<span id="cb16-9"><a href="#cb16-9"></a><span class="co">##  1 4842        1     1      1     1       1     1     1     1     1     1     1</span></span>
<span id="cb16-10"><a href="#cb16-10"></a><span class="co">##  2 4843        1     1      1     1       1     1     1     1     1     1     1</span></span>
<span id="cb16-11"><a href="#cb16-11"></a><span class="co">##  3 4844        1     1      1     1       1     1     1     1     1     1     1</span></span>
<span id="cb16-12"><a href="#cb16-12"></a><span class="co">##  4 4845        1     1      1     1       1     0     0     0     0     0     0</span></span>
<span id="cb16-13"><a href="#cb16-13"></a><span class="co">##  5 4847        1     1      1     0       0     0     0     0     0     0     0</span></span>
<span id="cb16-14"><a href="#cb16-14"></a><span class="co">##  6 4848        1     1      1     1       0     0     0     0     0     0     0</span></span>
<span id="cb16-15"><a href="#cb16-15"></a><span class="co">##  7 4849        1     1      0     0       0     0     0     0     0     0     0</span></span>
<span id="cb16-16"><a href="#cb16-16"></a><span class="co">##  8 4850        1     1      0     1       1     1     1     0     0     0     0</span></span>
<span id="cb16-17"><a href="#cb16-17"></a><span class="co">##  9 4851        1     1      0     0       0     0     0     0     0     0     0</span></span>
<span id="cb16-18"><a href="#cb16-18"></a><span class="co">## 10 4854        1     1      0     0       0     0     0     0     0     0     0</span></span>
<span id="cb16-19"><a href="#cb16-19"></a><span class="co">## 11 4855        1     1      1     1       1     0     0     0     0     0     0</span></span>
<span id="cb16-20"><a href="#cb16-20"></a><span class="co">## 12 4857        1     1      1     1       1     1     1     1     1     0     0</span></span>
<span id="cb16-21"><a href="#cb16-21"></a><span class="co">## 13 4858        1     1      1     1       1     1     1     1     1     1     1</span></span>
<span id="cb16-22"><a href="#cb16-22"></a><span class="co">## 14 4859        1     1      1     1       1     0     0     0     0     0     0</span></span>
<span id="cb16-23"><a href="#cb16-23"></a><span class="co">## 15 4861        1     1      1     1       1     1     1     1     1     1     1</span></span>
<span id="cb16-24"><a href="#cb16-24"></a><span class="co">## 16 4862        1     1      1     1       1     1     1     1     1     0     0</span></span>
<span id="cb16-25"><a href="#cb16-25"></a><span class="co">## 17 4863        1     1      0     0       0     0     0     0     0     0     0</span></span>
<span id="cb16-26"><a href="#cb16-26"></a><span class="co">## 18 4864        1     1      0     0       0     0     0     0     0     0     0</span></span>
<span id="cb16-27"><a href="#cb16-27"></a><span class="co">## 19 4865        1     1      1     0       0     0     0     0     0     0     0</span></span></code></pre></div>
</div>
<div id="聚合" class="section level4">
<h4>聚合</h4>
<div class="sourceCode" id="cb17"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb17-1"><a href="#cb17-1"></a>warpbreaks &lt;-<span class="st"> </span>warpbreaks <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">as_tibble</span>() </span>
<span id="cb17-2"><a href="#cb17-2"></a>warpbreaks <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">count</span>(wool, tension)</span>
<span id="cb17-3"><a href="#cb17-3"></a><span class="co">## # A tibble: 6 x 3</span></span>
<span id="cb17-4"><a href="#cb17-4"></a><span class="co">##   wool  tension     n</span></span>
<span id="cb17-5"><a href="#cb17-5"></a><span class="co">##   &lt;fct&gt; &lt;fct&gt;   &lt;int&gt;</span></span>
<span id="cb17-6"><a href="#cb17-6"></a><span class="co">## 1 A     L           9</span></span>
<span id="cb17-7"><a href="#cb17-7"></a><span class="co">## 2 A     M           9</span></span>
<span id="cb17-8"><a href="#cb17-8"></a><span class="co">## 3 A     H           9</span></span>
<span id="cb17-9"><a href="#cb17-9"></a><span class="co">## 4 B     L           9</span></span>
<span id="cb17-10"><a href="#cb17-10"></a><span class="co">## 5 B     M           9</span></span>
<span id="cb17-11"><a href="#cb17-11"></a><span class="co">## 6 B     H           9</span></span></code></pre></div>
<p>需要通过<code>values_fn</code>指定聚合方式</p>
<div class="sourceCode" id="cb18"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb18-1"><a href="#cb18-1"></a>warpbreaks <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">pivot_wider</span>(<span class="dt">names_from =</span> wool, <span class="dt">values_from =</span> breaks,<span class="dt">values_fn=</span> <span class="kw">list</span>(<span class="dt">breaks =</span> sum))</span>
<span id="cb18-2"><a href="#cb18-2"></a><span class="co">## # A tibble: 3 x 3</span></span>
<span id="cb18-3"><a href="#cb18-3"></a><span class="co">##   tension     A     B</span></span>
<span id="cb18-4"><a href="#cb18-4"></a><span class="co">##   &lt;fct&gt;   &lt;dbl&gt; &lt;dbl&gt;</span></span>
<span id="cb18-5"><a href="#cb18-5"></a><span class="co">## 1 L         401   254</span></span>
<span id="cb18-6"><a href="#cb18-6"></a><span class="co">## 2 M         216   259</span></span>
<span id="cb18-7"><a href="#cb18-7"></a><span class="co">## 3 H         221   169</span></span></code></pre></div>
</div>
<div id="从多个变量生成新列名" class="section level4">
<h4>从多个变量生成新列名</h4>
<div class="sourceCode" id="cb19"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb19-1"><a href="#cb19-1"></a>production &lt;-<span class="st"> </span><span class="kw">expand_grid</span>(</span>
<span id="cb19-2"><a href="#cb19-2"></a>    <span class="dt">product =</span> <span class="kw">c</span>(<span class="st">&quot;A&quot;</span>, <span class="st">&quot;B&quot;</span>), </span>
<span id="cb19-3"><a href="#cb19-3"></a>    <span class="dt">country =</span> <span class="kw">c</span>(<span class="st">&quot;AI&quot;</span>, <span class="st">&quot;EI&quot;</span>), </span>
<span id="cb19-4"><a href="#cb19-4"></a>    <span class="dt">year =</span> <span class="dv">2000</span><span class="op">:</span><span class="dv">2014</span></span>
<span id="cb19-5"><a href="#cb19-5"></a>  ) <span class="op">%&gt;%</span></span>
<span id="cb19-6"><a href="#cb19-6"></a><span class="st">  </span><span class="kw">filter</span>((product <span class="op">==</span><span class="st"> &quot;A&quot;</span> <span class="op">&amp;</span><span class="st"> </span>country <span class="op">==</span><span class="st"> &quot;AI&quot;</span>) <span class="op">|</span><span class="st"> </span>product <span class="op">==</span><span class="st"> &quot;B&quot;</span>) <span class="op">%&gt;%</span><span class="st"> </span></span>
<span id="cb19-7"><a href="#cb19-7"></a><span class="st">  </span><span class="kw">mutate</span>(<span class="dt">production =</span> <span class="kw">rnorm</span>(<span class="kw">nrow</span>(.)))</span>
<span id="cb19-8"><a href="#cb19-8"></a>production</span>
<span id="cb19-9"><a href="#cb19-9"></a><span class="co">## # A tibble: 45 x 4</span></span>
<span id="cb19-10"><a href="#cb19-10"></a><span class="co">##    product country  year production</span></span>
<span id="cb19-11"><a href="#cb19-11"></a><span class="co">##    &lt;chr&gt;   &lt;chr&gt;   &lt;int&gt;      &lt;dbl&gt;</span></span>
<span id="cb19-12"><a href="#cb19-12"></a><span class="co">##  1 A       AI       2000   -0.288  </span></span>
<span id="cb19-13"><a href="#cb19-13"></a><span class="co">##  2 A       AI       2001   -0.00118</span></span>
<span id="cb19-14"><a href="#cb19-14"></a><span class="co">##  3 A       AI       2002    0.186  </span></span>
<span id="cb19-15"><a href="#cb19-15"></a><span class="co">##  4 A       AI       2003   -1.42   </span></span>
<span id="cb19-16"><a href="#cb19-16"></a><span class="co">##  5 A       AI       2004   -0.399  </span></span>
<span id="cb19-17"><a href="#cb19-17"></a><span class="co">##  6 A       AI       2005    0.901  </span></span>
<span id="cb19-18"><a href="#cb19-18"></a><span class="co">##  7 A       AI       2006   -0.621  </span></span>
<span id="cb19-19"><a href="#cb19-19"></a><span class="co">##  8 A       AI       2007   -0.790  </span></span>
<span id="cb19-20"><a href="#cb19-20"></a><span class="co">##  9 A       AI       2008   -0.990  </span></span>
<span id="cb19-21"><a href="#cb19-21"></a><span class="co">## 10 A       AI       2009    1.46   </span></span>
<span id="cb19-22"><a href="#cb19-22"></a><span class="co">## # … with 35 more rows</span></span></code></pre></div>
<div class="sourceCode" id="cb20"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb20-1"><a href="#cb20-1"></a>production <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">pivot_wider</span>(</span>
<span id="cb20-2"><a href="#cb20-2"></a>  <span class="dt">names_from =</span> <span class="kw">c</span>(product, country), </span>
<span id="cb20-3"><a href="#cb20-3"></a>  <span class="dt">values_from =</span> production</span>
<span id="cb20-4"><a href="#cb20-4"></a>)</span>
<span id="cb20-5"><a href="#cb20-5"></a><span class="co">## # A tibble: 15 x 4</span></span>
<span id="cb20-6"><a href="#cb20-6"></a><span class="co">##     year     A_AI    B_AI    B_EI</span></span>
<span id="cb20-7"><a href="#cb20-7"></a><span class="co">##    &lt;int&gt;    &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;</span></span>
<span id="cb20-8"><a href="#cb20-8"></a><span class="co">##  1  2000 -0.288   -0.117  -0.0335</span></span>
<span id="cb20-9"><a href="#cb20-9"></a><span class="co">##  2  2001 -0.00118  1.39   -0.0998</span></span>
<span id="cb20-10"><a href="#cb20-10"></a><span class="co">##  3  2002  0.186   -0.158  -0.198 </span></span>
<span id="cb20-11"><a href="#cb20-11"></a><span class="co">##  4  2003 -1.42    -0.386  -1.38  </span></span>
<span id="cb20-12"><a href="#cb20-12"></a><span class="co">##  5  2004 -0.399    2.18    0.0948</span></span>
<span id="cb20-13"><a href="#cb20-13"></a><span class="co">##  6  2005  0.901    0.213  -0.520 </span></span>
<span id="cb20-14"><a href="#cb20-14"></a><span class="co">##  7  2006 -0.621    0.754   0.0250</span></span>
<span id="cb20-15"><a href="#cb20-15"></a><span class="co">##  8  2007 -0.790   -2.77    0.0711</span></span>
<span id="cb20-16"><a href="#cb20-16"></a><span class="co">##  9  2008 -0.990    0.368  -0.381 </span></span>
<span id="cb20-17"><a href="#cb20-17"></a><span class="co">## 10  2009  1.46     0.0640 -0.150 </span></span>
<span id="cb20-18"><a href="#cb20-18"></a><span class="co">## 11  2010  0.816    1.25    0.309 </span></span>
<span id="cb20-19"><a href="#cb20-19"></a><span class="co">## 12  2011 -0.432    1.80    0.472 </span></span>
<span id="cb20-20"><a href="#cb20-20"></a><span class="co">## 13  2012 -1.07     0.851  -0.0717</span></span>
<span id="cb20-21"><a href="#cb20-21"></a><span class="co">## 14  2013  1.36    -1.18   -0.970 </span></span>
<span id="cb20-22"><a href="#cb20-22"></a><span class="co">## 15  2014  0.516    1.27    0.703</span></span></code></pre></div>
<p>通过<code>names_sep</code>和<code>names_prefix</code>参数控制新的列名，或通过<code>names_glue</code></p>
<div class="sourceCode" id="cb21"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb21-1"><a href="#cb21-1"></a>production <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">pivot_wider</span>(</span>
<span id="cb21-2"><a href="#cb21-2"></a>  <span class="dt">names_from =</span> <span class="kw">c</span>(product, country), </span>
<span id="cb21-3"><a href="#cb21-3"></a>  <span class="dt">values_from =</span> production,</span>
<span id="cb21-4"><a href="#cb21-4"></a>  <span class="dt">names_sep =</span> <span class="st">&quot;.&quot;</span>,</span>
<span id="cb21-5"><a href="#cb21-5"></a>  <span class="dt">names_prefix =</span> <span class="st">&quot;prod.&quot;</span></span>
<span id="cb21-6"><a href="#cb21-6"></a>)</span>
<span id="cb21-7"><a href="#cb21-7"></a><span class="co">## # A tibble: 15 x 4</span></span>
<span id="cb21-8"><a href="#cb21-8"></a><span class="co">##     year prod.A.AI prod.B.AI prod.B.EI</span></span>
<span id="cb21-9"><a href="#cb21-9"></a><span class="co">##    &lt;int&gt;     &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt;</span></span>
<span id="cb21-10"><a href="#cb21-10"></a><span class="co">##  1  2000  -0.288     -0.117    -0.0335</span></span>
<span id="cb21-11"><a href="#cb21-11"></a><span class="co">##  2  2001  -0.00118    1.39     -0.0998</span></span>
<span id="cb21-12"><a href="#cb21-12"></a><span class="co">##  3  2002   0.186     -0.158    -0.198 </span></span>
<span id="cb21-13"><a href="#cb21-13"></a><span class="co">##  4  2003  -1.42      -0.386    -1.38  </span></span>
<span id="cb21-14"><a href="#cb21-14"></a><span class="co">##  5  2004  -0.399      2.18      0.0948</span></span>
<span id="cb21-15"><a href="#cb21-15"></a><span class="co">##  6  2005   0.901      0.213    -0.520 </span></span>
<span id="cb21-16"><a href="#cb21-16"></a><span class="co">##  7  2006  -0.621      0.754     0.0250</span></span>
<span id="cb21-17"><a href="#cb21-17"></a><span class="co">##  8  2007  -0.790     -2.77      0.0711</span></span>
<span id="cb21-18"><a href="#cb21-18"></a><span class="co">##  9  2008  -0.990      0.368    -0.381 </span></span>
<span id="cb21-19"><a href="#cb21-19"></a><span class="co">## 10  2009   1.46       0.0640   -0.150 </span></span>
<span id="cb21-20"><a href="#cb21-20"></a><span class="co">## 11  2010   0.816      1.25      0.309 </span></span>
<span id="cb21-21"><a href="#cb21-21"></a><span class="co">## 12  2011  -0.432      1.80      0.472 </span></span>
<span id="cb21-22"><a href="#cb21-22"></a><span class="co">## 13  2012  -1.07       0.851    -0.0717</span></span>
<span id="cb21-23"><a href="#cb21-23"></a><span class="co">## 14  2013   1.36      -1.18     -0.970 </span></span>
<span id="cb21-24"><a href="#cb21-24"></a><span class="co">## 15  2014   0.516      1.27      0.703</span></span></code></pre></div>
<div class="sourceCode" id="cb22"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb22-1"><a href="#cb22-1"></a>production <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">pivot_wider</span>(</span>
<span id="cb22-2"><a href="#cb22-2"></a>  <span class="dt">names_from =</span> <span class="kw">c</span>(product, country), </span>
<span id="cb22-3"><a href="#cb22-3"></a>  <span class="dt">values_from =</span> production,</span>
<span id="cb22-4"><a href="#cb22-4"></a>  <span class="dt">names_glue =</span> <span class="st">&quot;prod_{product}_{country}&quot;</span></span>
<span id="cb22-5"><a href="#cb22-5"></a>)</span>
<span id="cb22-6"><a href="#cb22-6"></a><span class="co">## # A tibble: 15 x 4</span></span>
<span id="cb22-7"><a href="#cb22-7"></a><span class="co">##     year prod_A_AI prod_B_AI prod_B_EI</span></span>
<span id="cb22-8"><a href="#cb22-8"></a><span class="co">##    &lt;int&gt;     &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt;</span></span>
<span id="cb22-9"><a href="#cb22-9"></a><span class="co">##  1  2000  -0.288     -0.117    -0.0335</span></span>
<span id="cb22-10"><a href="#cb22-10"></a><span class="co">##  2  2001  -0.00118    1.39     -0.0998</span></span>
<span id="cb22-11"><a href="#cb22-11"></a><span class="co">##  3  2002   0.186     -0.158    -0.198 </span></span>
<span id="cb22-12"><a href="#cb22-12"></a><span class="co">##  4  2003  -1.42      -0.386    -1.38  </span></span>
<span id="cb22-13"><a href="#cb22-13"></a><span class="co">##  5  2004  -0.399      2.18      0.0948</span></span>
<span id="cb22-14"><a href="#cb22-14"></a><span class="co">##  6  2005   0.901      0.213    -0.520 </span></span>
<span id="cb22-15"><a href="#cb22-15"></a><span class="co">##  7  2006  -0.621      0.754     0.0250</span></span>
<span id="cb22-16"><a href="#cb22-16"></a><span class="co">##  8  2007  -0.790     -2.77      0.0711</span></span>
<span id="cb22-17"><a href="#cb22-17"></a><span class="co">##  9  2008  -0.990      0.368    -0.381 </span></span>
<span id="cb22-18"><a href="#cb22-18"></a><span class="co">## 10  2009   1.46       0.0640   -0.150 </span></span>
<span id="cb22-19"><a href="#cb22-19"></a><span class="co">## 11  2010   0.816      1.25      0.309 </span></span>
<span id="cb22-20"><a href="#cb22-20"></a><span class="co">## 12  2011  -0.432      1.80      0.472 </span></span>
<span id="cb22-21"><a href="#cb22-21"></a><span class="co">## 13  2012  -1.07       0.851    -0.0717</span></span>
<span id="cb22-22"><a href="#cb22-22"></a><span class="co">## 14  2013   1.36      -1.18     -0.970 </span></span>
<span id="cb22-23"><a href="#cb22-23"></a><span class="co">## 15  2014   0.516      1.27      0.703</span></span></code></pre></div>
</div>
<div id="多值变宽" class="section level4">
<h4>多值变宽</h4>
<div class="sourceCode" id="cb23"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb23-1"><a href="#cb23-1"></a>us_rent_income <span class="op">%&gt;%</span><span class="st"> </span></span>
<span id="cb23-2"><a href="#cb23-2"></a><span class="st">  </span><span class="kw">pivot_wider</span>(<span class="dt">names_from =</span> variable, <span class="dt">values_from =</span> <span class="kw">c</span>(estimate, moe))</span>
<span id="cb23-3"><a href="#cb23-3"></a><span class="co">## # A tibble: 52 x 6</span></span>
<span id="cb23-4"><a href="#cb23-4"></a><span class="co">##    GEOID NAME                 estimate_income estimate_rent moe_income moe_rent</span></span>
<span id="cb23-5"><a href="#cb23-5"></a><span class="co">##    &lt;chr&gt; &lt;chr&gt;                          &lt;dbl&gt;         &lt;dbl&gt;      &lt;dbl&gt;    &lt;dbl&gt;</span></span>
<span id="cb23-6"><a href="#cb23-6"></a><span class="co">##  1 01    Alabama                        24476           747        136        3</span></span>
<span id="cb23-7"><a href="#cb23-7"></a><span class="co">##  2 02    Alaska                         32940          1200        508       13</span></span>
<span id="cb23-8"><a href="#cb23-8"></a><span class="co">##  3 04    Arizona                        27517           972        148        4</span></span>
<span id="cb23-9"><a href="#cb23-9"></a><span class="co">##  4 05    Arkansas                       23789           709        165        5</span></span>
<span id="cb23-10"><a href="#cb23-10"></a><span class="co">##  5 06    California                     29454          1358        109        3</span></span>
<span id="cb23-11"><a href="#cb23-11"></a><span class="co">##  6 08    Colorado                       32401          1125        109        5</span></span>
<span id="cb23-12"><a href="#cb23-12"></a><span class="co">##  7 09    Connecticut                    35326          1123        195        5</span></span>
<span id="cb23-13"><a href="#cb23-13"></a><span class="co">##  8 10    Delaware                       31560          1076        247       10</span></span>
<span id="cb23-14"><a href="#cb23-14"></a><span class="co">##  9 11    District of Columbia           43198          1424        681       17</span></span>
<span id="cb23-15"><a href="#cb23-15"></a><span class="co">## 10 12    Florida                        25952          1077         70        3</span></span>
<span id="cb23-16"><a href="#cb23-16"></a><span class="co">## # … with 42 more rows</span></span></code></pre></div>
</div>
</div>
<div id="处理jsonhtml的数据" class="section level3">
<h3>处理json,html的数据</h3>
<p>实际工作中不是经常使用，需要使用的时候往往会用相关的包处理：<code>jsonlite</code></p>
<p>可通过<code>vignette("rectangle")</code>自行学习</p>
<div class="sourceCode" id="cb24"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb24-1"><a href="#cb24-1"></a><span class="kw">library</span>(tidyr)</span>
<span id="cb24-2"><a href="#cb24-2"></a><span class="kw">library</span>(dplyr)</span>
<span id="cb24-3"><a href="#cb24-3"></a><span class="kw">library</span>(repurrrsive)</span></code></pre></div>
<div class="sourceCode" id="cb25"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb25-1"><a href="#cb25-1"></a>users &lt;-<span class="st"> </span><span class="kw">tibble</span>(<span class="dt">user =</span> gh_users)</span>
<span id="cb25-2"><a href="#cb25-2"></a>users</span>
<span id="cb25-3"><a href="#cb25-3"></a><span class="co">## # A tibble: 6 x 1</span></span>
<span id="cb25-4"><a href="#cb25-4"></a><span class="co">##   user             </span></span>
<span id="cb25-5"><a href="#cb25-5"></a><span class="co">##   &lt;list&gt;           </span></span>
<span id="cb25-6"><a href="#cb25-6"></a><span class="co">## 1 &lt;named list [30]&gt;</span></span>
<span id="cb25-7"><a href="#cb25-7"></a><span class="co">## 2 &lt;named list [30]&gt;</span></span>
<span id="cb25-8"><a href="#cb25-8"></a><span class="co">## 3 &lt;named list [30]&gt;</span></span>
<span id="cb25-9"><a href="#cb25-9"></a><span class="co">## 4 &lt;named list [30]&gt;</span></span>
<span id="cb25-10"><a href="#cb25-10"></a><span class="co">## 5 &lt;named list [30]&gt;</span></span>
<span id="cb25-11"><a href="#cb25-11"></a><span class="co">## 6 &lt;named list [30]&gt;</span></span>
<span id="cb25-12"><a href="#cb25-12"></a>users <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">unnest_wider</span>(user)</span>
<span id="cb25-13"><a href="#cb25-13"></a><span class="co">## # A tibble: 6 x 30</span></span>
<span id="cb25-14"><a href="#cb25-14"></a><span class="co">##   login     id avatar_url gravatar_id url   html_url followers_url following_url</span></span>
<span id="cb25-15"><a href="#cb25-15"></a><span class="co">##   &lt;chr&gt;  &lt;int&gt; &lt;chr&gt;      &lt;chr&gt;       &lt;chr&gt; &lt;chr&gt;    &lt;chr&gt;         &lt;chr&gt;        </span></span>
<span id="cb25-16"><a href="#cb25-16"></a><span class="co">## 1 gabo… 6.60e5 https://a… &quot;&quot;          http… https:/… https://api.… https://api.…</span></span>
<span id="cb25-17"><a href="#cb25-17"></a><span class="co">## 2 jenn… 5.99e5 https://a… &quot;&quot;          http… https:/… https://api.… https://api.…</span></span>
<span id="cb25-18"><a href="#cb25-18"></a><span class="co">## 3 jtle… 1.57e6 https://a… &quot;&quot;          http… https:/… https://api.… https://api.…</span></span>
<span id="cb25-19"><a href="#cb25-19"></a><span class="co">## 4 juli… 1.25e7 https://a… &quot;&quot;          http… https:/… https://api.… https://api.…</span></span>
<span id="cb25-20"><a href="#cb25-20"></a><span class="co">## 5 leep… 3.51e6 https://a… &quot;&quot;          http… https:/… https://api.… https://api.…</span></span>
<span id="cb25-21"><a href="#cb25-21"></a><span class="co">## 6 masa… 8.36e6 https://a… &quot;&quot;          http… https:/… https://api.… https://api.…</span></span>
<span id="cb25-22"><a href="#cb25-22"></a><span class="co">## # … with 22 more variables: gists_url &lt;chr&gt;, starred_url &lt;chr&gt;,</span></span>
<span id="cb25-23"><a href="#cb25-23"></a><span class="co">## #   subscriptions_url &lt;chr&gt;, organizations_url &lt;chr&gt;, repos_url &lt;chr&gt;,</span></span>
<span id="cb25-24"><a href="#cb25-24"></a><span class="co">## #   events_url &lt;chr&gt;, received_events_url &lt;chr&gt;, type &lt;chr&gt;, site_admin &lt;lgl&gt;,</span></span>
<span id="cb25-25"><a href="#cb25-25"></a><span class="co">## #   name &lt;chr&gt;, company &lt;chr&gt;, blog &lt;chr&gt;, location &lt;chr&gt;, email &lt;chr&gt;,</span></span>
<span id="cb25-26"><a href="#cb25-26"></a><span class="co">## #   public_repos &lt;int&gt;, public_gists &lt;int&gt;, followers &lt;int&gt;, following &lt;int&gt;,</span></span>
<span id="cb25-27"><a href="#cb25-27"></a><span class="co">## #   created_at &lt;chr&gt;, updated_at &lt;chr&gt;, bio &lt;chr&gt;, hireable &lt;lgl&gt;</span></span></code></pre></div>
</div>
<div id="嵌套数据" class="section level3">
<h3>嵌套数据</h3>
<div class="sourceCode" id="cb26"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb26-1"><a href="#cb26-1"></a><span class="kw">library</span>(tidyr)</span>
<span id="cb26-2"><a href="#cb26-2"></a><span class="kw">library</span>(dplyr)</span>
<span id="cb26-3"><a href="#cb26-3"></a><span class="kw">library</span>(purrr)</span></code></pre></div>
<div id="基础-2" class="section level4">
<h4>基础</h4>
<p>嵌套数据即：数据框中嵌套数据框，如下所示：</p>
<div class="sourceCode" id="cb27"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb27-1"><a href="#cb27-1"></a>df1 &lt;-<span class="st"> </span><span class="kw">tibble</span>(</span>
<span id="cb27-2"><a href="#cb27-2"></a>  <span class="dt">g =</span> <span class="kw">c</span>(<span class="dv">1</span>, <span class="dv">2</span>, <span class="dv">3</span>),</span>
<span id="cb27-3"><a href="#cb27-3"></a>  <span class="dt">data =</span> <span class="kw">list</span>(</span>
<span id="cb27-4"><a href="#cb27-4"></a>    <span class="kw">tibble</span>(<span class="dt">x =</span> <span class="dv">1</span>, <span class="dt">y =</span> <span class="dv">2</span>),</span>
<span id="cb27-5"><a href="#cb27-5"></a>    <span class="kw">tibble</span>(<span class="dt">x =</span> <span class="dv">4</span><span class="op">:</span><span class="dv">5</span>, <span class="dt">y =</span> <span class="dv">6</span><span class="op">:</span><span class="dv">7</span>),</span>
<span id="cb27-6"><a href="#cb27-6"></a>    <span class="kw">tibble</span>(<span class="dt">x =</span> <span class="dv">10</span>)</span>
<span id="cb27-7"><a href="#cb27-7"></a>  )</span>
<span id="cb27-8"><a href="#cb27-8"></a>)</span>
<span id="cb27-9"><a href="#cb27-9"></a>df1</span>
<span id="cb27-10"><a href="#cb27-10"></a><span class="co">## # A tibble: 3 x 2</span></span>
<span id="cb27-11"><a href="#cb27-11"></a><span class="co">##       g data                </span></span>
<span id="cb27-12"><a href="#cb27-12"></a><span class="co">##   &lt;dbl&gt; &lt;list&gt;              </span></span>
<span id="cb27-13"><a href="#cb27-13"></a><span class="co">## 1     1 &lt;tibble[,2] [1 × 2]&gt;</span></span>
<span id="cb27-14"><a href="#cb27-14"></a><span class="co">## 2     2 &lt;tibble[,2] [2 × 2]&gt;</span></span>
<span id="cb27-15"><a href="#cb27-15"></a><span class="co">## 3     3 &lt;tibble[,1] [1 × 1]&gt;</span></span></code></pre></div>
<p>因为<code>data.frame()</code>的列特性【每列都是列表】【不确定理解对不对】：可以做如下操作：</p>
<div class="sourceCode" id="cb28"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb28-1"><a href="#cb28-1"></a>df2 &lt;-<span class="st"> </span><span class="kw">tribble</span>(</span>
<span id="cb28-2"><a href="#cb28-2"></a>  <span class="op">~</span>g, <span class="op">~</span>x, <span class="op">~</span>y,</span>
<span id="cb28-3"><a href="#cb28-3"></a>   <span class="dv">1</span>,  <span class="dv">1</span>,  <span class="dv">2</span>,</span>
<span id="cb28-4"><a href="#cb28-4"></a>   <span class="dv">2</span>,  <span class="dv">4</span>,  <span class="dv">6</span>,</span>
<span id="cb28-5"><a href="#cb28-5"></a>   <span class="dv">2</span>,  <span class="dv">5</span>,  <span class="dv">7</span>,</span>
<span id="cb28-6"><a href="#cb28-6"></a>   <span class="dv">3</span>, <span class="dv">10</span>,  <span class="ot">NA</span></span>
<span id="cb28-7"><a href="#cb28-7"></a>)</span>
<span id="cb28-8"><a href="#cb28-8"></a>df2 <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">nest</span>(<span class="dt">data =</span> <span class="kw">c</span>(x, y))</span>
<span id="cb28-9"><a href="#cb28-9"></a><span class="co">## # A tibble: 3 x 2</span></span>
<span id="cb28-10"><a href="#cb28-10"></a><span class="co">##       g data                </span></span>
<span id="cb28-11"><a href="#cb28-11"></a><span class="co">##   &lt;dbl&gt; &lt;list&gt;              </span></span>
<span id="cb28-12"><a href="#cb28-12"></a><span class="co">## 1     1 &lt;tibble[,2] [1 × 2]&gt;</span></span>
<span id="cb28-13"><a href="#cb28-13"></a><span class="co">## 2     2 &lt;tibble[,2] [2 × 2]&gt;</span></span>
<span id="cb28-14"><a href="#cb28-14"></a><span class="co">## 3     3 &lt;tibble[,2] [1 × 2]&gt;</span></span>
<span id="cb28-15"><a href="#cb28-15"></a></span>
<span id="cb28-16"><a href="#cb28-16"></a><span class="co">#sample above</span></span>
<span id="cb28-17"><a href="#cb28-17"></a><span class="co">#df2 %&gt;% group_by(g) %&gt;% nest()</span></span></code></pre></div>
<p>nest的反面 unnest</p>
<div class="sourceCode" id="cb29"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb29-1"><a href="#cb29-1"></a>df1 <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">unnest</span>(data)</span>
<span id="cb29-2"><a href="#cb29-2"></a><span class="co">## # A tibble: 4 x 3</span></span>
<span id="cb29-3"><a href="#cb29-3"></a><span class="co">##       g     x     y</span></span>
<span id="cb29-4"><a href="#cb29-4"></a><span class="co">##   &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;</span></span>
<span id="cb29-5"><a href="#cb29-5"></a><span class="co">## 1     1     1     2</span></span>
<span id="cb29-6"><a href="#cb29-6"></a><span class="co">## 2     2     4     6</span></span>
<span id="cb29-7"><a href="#cb29-7"></a><span class="co">## 3     2     5     7</span></span>
<span id="cb29-8"><a href="#cb29-8"></a><span class="co">## 4     3    10    NA</span></span></code></pre></div>
</div>
</div>
<div id="嵌套数据和模型" class="section level3">
<h3>嵌套数据和模型</h3>
<div class="sourceCode" id="cb30"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb30-1"><a href="#cb30-1"></a>mtcars_nested &lt;-<span class="st"> </span>mtcars <span class="op">%&gt;%</span><span class="st"> </span></span>
<span id="cb30-2"><a href="#cb30-2"></a><span class="st">  </span><span class="kw">group_by</span>(cyl) <span class="op">%&gt;%</span><span class="st"> </span></span>
<span id="cb30-3"><a href="#cb30-3"></a><span class="st">  </span><span class="kw">nest</span>()</span>
<span id="cb30-4"><a href="#cb30-4"></a></span>
<span id="cb30-5"><a href="#cb30-5"></a>mtcars_nested</span>
<span id="cb30-6"><a href="#cb30-6"></a><span class="co">## # A tibble: 3 x 2</span></span>
<span id="cb30-7"><a href="#cb30-7"></a><span class="co">## # Groups:   cyl [3]</span></span>
<span id="cb30-8"><a href="#cb30-8"></a><span class="co">##     cyl data                   </span></span>
<span id="cb30-9"><a href="#cb30-9"></a><span class="co">##   &lt;dbl&gt; &lt;list&gt;                 </span></span>
<span id="cb30-10"><a href="#cb30-10"></a><span class="co">## 1     6 &lt;tibble[,10] [7 × 10]&gt; </span></span>
<span id="cb30-11"><a href="#cb30-11"></a><span class="co">## 2     4 &lt;tibble[,10] [11 × 10]&gt;</span></span>
<span id="cb30-12"><a href="#cb30-12"></a><span class="co">## 3     8 &lt;tibble[,10] [14 × 10]&gt;</span></span></code></pre></div>
<div class="sourceCode" id="cb31"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb31-1"><a href="#cb31-1"></a>mtcars_nested &lt;-<span class="st"> </span>mtcars_nested <span class="op">%&gt;%</span><span class="st"> </span></span>
<span id="cb31-2"><a href="#cb31-2"></a><span class="st">  </span><span class="kw">mutate</span>(<span class="dt">model =</span> <span class="kw">map</span>(data, <span class="cf">function</span>(df) <span class="kw">lm</span>(mpg <span class="op">~</span><span class="st"> </span>wt, <span class="dt">data =</span> df)))</span>
<span id="cb31-3"><a href="#cb31-3"></a>mtcars_nested</span>
<span id="cb31-4"><a href="#cb31-4"></a><span class="co">## # A tibble: 3 x 3</span></span>
<span id="cb31-5"><a href="#cb31-5"></a><span class="co">## # Groups:   cyl [3]</span></span>
<span id="cb31-6"><a href="#cb31-6"></a><span class="co">##     cyl data                    model </span></span>
<span id="cb31-7"><a href="#cb31-7"></a><span class="co">##   &lt;dbl&gt; &lt;list&gt;                  &lt;list&gt;</span></span>
<span id="cb31-8"><a href="#cb31-8"></a><span class="co">## 1     6 &lt;tibble[,10] [7 × 10]&gt;  &lt;lm&gt;  </span></span>
<span id="cb31-9"><a href="#cb31-9"></a><span class="co">## 2     4 &lt;tibble[,10] [11 × 10]&gt; &lt;lm&gt;  </span></span>
<span id="cb31-10"><a href="#cb31-10"></a><span class="co">## 3     8 &lt;tibble[,10] [14 × 10]&gt; &lt;lm&gt;</span></span></code></pre></div>
<div class="sourceCode" id="cb32"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb32-1"><a href="#cb32-1"></a>mtcars_nested &lt;-<span class="st"> </span>mtcars_nested <span class="op">%&gt;%</span><span class="st"> </span></span>
<span id="cb32-2"><a href="#cb32-2"></a><span class="st">  </span><span class="kw">mutate</span>(<span class="dt">model =</span> <span class="kw">map</span>(model, predict))</span>
<span id="cb32-3"><a href="#cb32-3"></a>mtcars_nested  </span>
<span id="cb32-4"><a href="#cb32-4"></a><span class="co">## # A tibble: 3 x 3</span></span>
<span id="cb32-5"><a href="#cb32-5"></a><span class="co">## # Groups:   cyl [3]</span></span>
<span id="cb32-6"><a href="#cb32-6"></a><span class="co">##     cyl data                    model     </span></span>
<span id="cb32-7"><a href="#cb32-7"></a><span class="co">##   &lt;dbl&gt; &lt;list&gt;                  &lt;list&gt;    </span></span>
<span id="cb32-8"><a href="#cb32-8"></a><span class="co">## 1     6 &lt;tibble[,10] [7 × 10]&gt;  &lt;dbl [7]&gt; </span></span>
<span id="cb32-9"><a href="#cb32-9"></a><span class="co">## 2     4 &lt;tibble[,10] [11 × 10]&gt; &lt;dbl [11]&gt;</span></span>
<span id="cb32-10"><a href="#cb32-10"></a><span class="co">## 3     8 &lt;tibble[,10] [14 × 10]&gt; &lt;dbl [14]&gt;</span></span></code></pre></div>
</div>
<div id="拆分和合并" class="section level3">
<h3>拆分和合并</h3>
<div id="拆分" class="section level4">
<h4>拆分</h4>
<p>有时我们需要将一列拆分为多列：</p>
<div class="sourceCode" id="cb33"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb33-1"><a href="#cb33-1"></a><span class="kw">library</span>(tidyr)</span>
<span id="cb33-2"><a href="#cb33-2"></a>df &lt;-<span class="st"> </span><span class="kw">data.frame</span>(<span class="dt">x =</span> <span class="kw">c</span>(<span class="ot">NA</span>, <span class="st">&quot;a.b&quot;</span>, <span class="st">&quot;a.d&quot;</span>, <span class="st">&quot;b.c&quot;</span>))</span>
<span id="cb33-3"><a href="#cb33-3"></a>df <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">separate</span>(x, <span class="kw">c</span>(<span class="st">&quot;A&quot;</span>, <span class="st">&quot;B&quot;</span>))</span>
<span id="cb33-4"><a href="#cb33-4"></a><span class="co">##      A    B</span></span>
<span id="cb33-5"><a href="#cb33-5"></a><span class="co">## 1 &lt;NA&gt; &lt;NA&gt;</span></span>
<span id="cb33-6"><a href="#cb33-6"></a><span class="co">## 2    a    b</span></span>
<span id="cb33-7"><a href="#cb33-7"></a><span class="co">## 3    a    d</span></span>
<span id="cb33-8"><a href="#cb33-8"></a><span class="co">## 4    b    c</span></span></code></pre></div>
<p>拆分数多列或少列时用<code>NA</code>补齐：</p>
<div class="sourceCode" id="cb34"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb34-1"><a href="#cb34-1"></a>df &lt;-<span class="st"> </span><span class="kw">data.frame</span>(<span class="dt">x =</span> <span class="kw">c</span>(<span class="st">&quot;a&quot;</span>, <span class="st">&quot;a b&quot;</span>, <span class="st">&quot;a b c&quot;</span>, <span class="ot">NA</span>))</span>
<span id="cb34-2"><a href="#cb34-2"></a>df <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">separate</span>(x, <span class="kw">c</span>(<span class="st">&quot;a&quot;</span>, <span class="st">&quot;b&quot;</span>))</span>
<span id="cb34-3"><a href="#cb34-3"></a><span class="co">##      a    b</span></span>
<span id="cb34-4"><a href="#cb34-4"></a><span class="co">## 1    a &lt;NA&gt;</span></span>
<span id="cb34-5"><a href="#cb34-5"></a><span class="co">## 2    a    b</span></span>
<span id="cb34-6"><a href="#cb34-6"></a><span class="co">## 3    a    b</span></span>
<span id="cb34-7"><a href="#cb34-7"></a><span class="co">## 4 &lt;NA&gt; &lt;NA&gt;</span></span></code></pre></div>
<p>多余的部分舍弃，缺失填充在左边还是右边：</p>
<div class="sourceCode" id="cb35"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb35-1"><a href="#cb35-1"></a><span class="co"># The same behaviour as previous, but drops the c without warnings:</span></span>
<span id="cb35-2"><a href="#cb35-2"></a>df <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">separate</span>(x, <span class="kw">c</span>(<span class="st">&quot;a&quot;</span>, <span class="st">&quot;b&quot;</span>), <span class="dt">extra =</span> <span class="st">&quot;drop&quot;</span>, <span class="dt">fill =</span> <span class="st">&quot;right&quot;</span>)</span>
<span id="cb35-3"><a href="#cb35-3"></a><span class="co">##      a    b</span></span>
<span id="cb35-4"><a href="#cb35-4"></a><span class="co">## 1    a &lt;NA&gt;</span></span>
<span id="cb35-5"><a href="#cb35-5"></a><span class="co">## 2    a    b</span></span>
<span id="cb35-6"><a href="#cb35-6"></a><span class="co">## 3    a    b</span></span>
<span id="cb35-7"><a href="#cb35-7"></a><span class="co">## 4 &lt;NA&gt; &lt;NA&gt;</span></span></code></pre></div>
<p>多余部分合并，缺失填充在左边</p>
<div class="sourceCode" id="cb36"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb36-1"><a href="#cb36-1"></a>df <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">separate</span>(x, <span class="kw">c</span>(<span class="st">&quot;a&quot;</span>, <span class="st">&quot;b&quot;</span>), <span class="dt">extra =</span> <span class="st">&quot;merge&quot;</span>, <span class="dt">fill =</span> <span class="st">&quot;left&quot;</span>)</span>
<span id="cb36-2"><a href="#cb36-2"></a><span class="co">##      a    b</span></span>
<span id="cb36-3"><a href="#cb36-3"></a><span class="co">## 1 &lt;NA&gt;    a</span></span>
<span id="cb36-4"><a href="#cb36-4"></a><span class="co">## 2    a    b</span></span>
<span id="cb36-5"><a href="#cb36-5"></a><span class="co">## 3    a  b c</span></span>
<span id="cb36-6"><a href="#cb36-6"></a><span class="co">## 4 &lt;NA&gt; &lt;NA&gt;</span></span></code></pre></div>
<p>或者全部保留</p>
<div class="sourceCode" id="cb37"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb37-1"><a href="#cb37-1"></a>df <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">separate</span>(x, <span class="kw">c</span>(<span class="st">&quot;a&quot;</span>, <span class="st">&quot;b&quot;</span>, <span class="st">&quot;c&quot;</span>))</span>
<span id="cb37-2"><a href="#cb37-2"></a><span class="co">##      a    b    c</span></span>
<span id="cb37-3"><a href="#cb37-3"></a><span class="co">## 1    a &lt;NA&gt; &lt;NA&gt;</span></span>
<span id="cb37-4"><a href="#cb37-4"></a><span class="co">## 2    a    b &lt;NA&gt;</span></span>
<span id="cb37-5"><a href="#cb37-5"></a><span class="co">## 3    a    b    c</span></span>
<span id="cb37-6"><a href="#cb37-6"></a><span class="co">## 4 &lt;NA&gt; &lt;NA&gt; &lt;NA&gt;</span></span></code></pre></div>
<p>指定分隔符</p>
<div class="sourceCode" id="cb38"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb38-1"><a href="#cb38-1"></a>df <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">separate</span>(x, <span class="kw">c</span>(<span class="st">&quot;key&quot;</span>, <span class="st">&quot;value&quot;</span>), <span class="dt">sep =</span> <span class="st">&quot;: &quot;</span>, <span class="dt">extra =</span> <span class="st">&quot;merge&quot;</span>)</span>
<span id="cb38-2"><a href="#cb38-2"></a><span class="co">##     key value</span></span>
<span id="cb38-3"><a href="#cb38-3"></a><span class="co">## 1     a  &lt;NA&gt;</span></span>
<span id="cb38-4"><a href="#cb38-4"></a><span class="co">## 2   a b  &lt;NA&gt;</span></span>
<span id="cb38-5"><a href="#cb38-5"></a><span class="co">## 3 a b c  &lt;NA&gt;</span></span>
<span id="cb38-6"><a href="#cb38-6"></a><span class="co">## 4  &lt;NA&gt;  &lt;NA&gt;</span></span></code></pre></div>
<p>使用正则表达式</p>
<div class="sourceCode" id="cb39"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb39-1"><a href="#cb39-1"></a><span class="co"># Use regular expressions to separate on multiple characters:</span></span>
<span id="cb39-2"><a href="#cb39-2"></a>df &lt;-<span class="st"> </span><span class="kw">data.frame</span>(<span class="dt">x =</span> <span class="kw">c</span>(<span class="ot">NA</span>, <span class="st">&quot;a?b&quot;</span>, <span class="st">&quot;a.d&quot;</span>, <span class="st">&quot;b:c&quot;</span>))</span>
<span id="cb39-3"><a href="#cb39-3"></a>df <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">separate</span>(x, <span class="kw">c</span>(<span class="st">&quot;A&quot;</span>,<span class="st">&quot;B&quot;</span>), <span class="dt">sep =</span> <span class="st">&quot;([.?:])&quot;</span>)</span>
<span id="cb39-4"><a href="#cb39-4"></a><span class="co">##      A    B</span></span>
<span id="cb39-5"><a href="#cb39-5"></a><span class="co">## 1 &lt;NA&gt; &lt;NA&gt;</span></span>
<span id="cb39-6"><a href="#cb39-6"></a><span class="co">## 2    a    b</span></span>
<span id="cb39-7"><a href="#cb39-7"></a><span class="co">## 3    a    d</span></span>
<span id="cb39-8"><a href="#cb39-8"></a><span class="co">## 4    b    c</span></span></code></pre></div>
</div>
<div id="新列提取" class="section level4">
<h4>新列提取</h4>
<div class="sourceCode" id="cb40"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb40-1"><a href="#cb40-1"></a>df &lt;-<span class="st"> </span><span class="kw">data.frame</span>(<span class="dt">x =</span> <span class="kw">c</span>(<span class="ot">NA</span>, <span class="st">&quot;a-b&quot;</span>, <span class="st">&quot;a-d&quot;</span>, <span class="st">&quot;b-c&quot;</span>, <span class="st">&quot;d-e&quot;</span>))</span>
<span id="cb40-2"><a href="#cb40-2"></a>df <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">extract</span>(x, <span class="st">&quot;A&quot;</span>)</span>
<span id="cb40-3"><a href="#cb40-3"></a><span class="co">##      A</span></span>
<span id="cb40-4"><a href="#cb40-4"></a><span class="co">## 1 &lt;NA&gt;</span></span>
<span id="cb40-5"><a href="#cb40-5"></a><span class="co">## 2    a</span></span>
<span id="cb40-6"><a href="#cb40-6"></a><span class="co">## 3    a</span></span>
<span id="cb40-7"><a href="#cb40-7"></a><span class="co">## 4    b</span></span>
<span id="cb40-8"><a href="#cb40-8"></a><span class="co">## 5    d</span></span>
<span id="cb40-9"><a href="#cb40-9"></a>df <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">extract</span>(x, <span class="kw">c</span>(<span class="st">&quot;A&quot;</span>, <span class="st">&quot;B&quot;</span>), <span class="st">&quot;([[:alnum:]]+)-([[:alnum:]]+)&quot;</span>)</span>
<span id="cb40-10"><a href="#cb40-10"></a><span class="co">##      A    B</span></span>
<span id="cb40-11"><a href="#cb40-11"></a><span class="co">## 1 &lt;NA&gt; &lt;NA&gt;</span></span>
<span id="cb40-12"><a href="#cb40-12"></a><span class="co">## 2    a    b</span></span>
<span id="cb40-13"><a href="#cb40-13"></a><span class="co">## 3    a    d</span></span>
<span id="cb40-14"><a href="#cb40-14"></a><span class="co">## 4    b    c</span></span>
<span id="cb40-15"><a href="#cb40-15"></a><span class="co">## 5    d    e</span></span>
<span id="cb40-16"><a href="#cb40-16"></a><span class="co"># [:alnum:] 匹配字母和数字</span></span></code></pre></div>
<p>以上本质是字符处理，<a href="http://baiy.cn/utils/_regex_doc/index.htm">正则表达式</a></p>
</div>
<div id="合并" class="section level4">
<h4>合并</h4>
<div class="sourceCode" id="cb41"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb41-1"><a href="#cb41-1"></a>df &lt;-<span class="st"> </span><span class="kw">expand_grid</span>(<span class="dt">x =</span> <span class="kw">c</span>(<span class="st">&quot;a&quot;</span>, <span class="ot">NA</span>), <span class="dt">y =</span> <span class="kw">c</span>(<span class="st">&quot;b&quot;</span>, <span class="ot">NA</span>))</span>
<span id="cb41-2"><a href="#cb41-2"></a>df</span>
<span id="cb41-3"><a href="#cb41-3"></a><span class="co">## # A tibble: 4 x 2</span></span>
<span id="cb41-4"><a href="#cb41-4"></a><span class="co">##   x     y    </span></span>
<span id="cb41-5"><a href="#cb41-5"></a><span class="co">##   &lt;chr&gt; &lt;chr&gt;</span></span>
<span id="cb41-6"><a href="#cb41-6"></a><span class="co">## 1 a     b    </span></span>
<span id="cb41-7"><a href="#cb41-7"></a><span class="co">## 2 a     &lt;NA&gt; </span></span>
<span id="cb41-8"><a href="#cb41-8"></a><span class="co">## 3 &lt;NA&gt;  b    </span></span>
<span id="cb41-9"><a href="#cb41-9"></a><span class="co">## 4 &lt;NA&gt;  &lt;NA&gt;</span></span>
<span id="cb41-10"><a href="#cb41-10"></a>df <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">unite</span>(<span class="st">&quot;z&quot;</span>, x<span class="op">:</span>y, <span class="dt">remove =</span> <span class="ot">FALSE</span>)</span>
<span id="cb41-11"><a href="#cb41-11"></a><span class="co">## # A tibble: 4 x 3</span></span>
<span id="cb41-12"><a href="#cb41-12"></a><span class="co">##   z     x     y    </span></span>
<span id="cb41-13"><a href="#cb41-13"></a><span class="co">##   &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;</span></span>
<span id="cb41-14"><a href="#cb41-14"></a><span class="co">## 1 a_b   a     b    </span></span>
<span id="cb41-15"><a href="#cb41-15"></a><span class="co">## 2 a_NA  a     &lt;NA&gt; </span></span>
<span id="cb41-16"><a href="#cb41-16"></a><span class="co">## 3 NA_b  &lt;NA&gt;  b    </span></span>
<span id="cb41-17"><a href="#cb41-17"></a><span class="co">## 4 NA_NA &lt;NA&gt;  &lt;NA&gt;</span></span>
<span id="cb41-18"><a href="#cb41-18"></a><span class="co"># expand_grid 类似笛卡尔积功能</span></span></code></pre></div>
<p>移除缺失值</p>
<div class="sourceCode" id="cb42"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb42-1"><a href="#cb42-1"></a>df <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">unite</span>(<span class="st">&quot;z&quot;</span>, x<span class="op">:</span>y, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>, <span class="dt">remove =</span> <span class="ot">FALSE</span>)</span>
<span id="cb42-2"><a href="#cb42-2"></a><span class="co">## # A tibble: 4 x 3</span></span>
<span id="cb42-3"><a href="#cb42-3"></a><span class="co">##   z     x     y    </span></span>
<span id="cb42-4"><a href="#cb42-4"></a><span class="co">##   &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;</span></span>
<span id="cb42-5"><a href="#cb42-5"></a><span class="co">## 1 &quot;a_b&quot; a     b    </span></span>
<span id="cb42-6"><a href="#cb42-6"></a><span class="co">## 2 &quot;a&quot;   a     &lt;NA&gt; </span></span>
<span id="cb42-7"><a href="#cb42-7"></a><span class="co">## 3 &quot;b&quot;   &lt;NA&gt;  b    </span></span>
<span id="cb42-8"><a href="#cb42-8"></a><span class="co">## 4 &quot;&quot;    &lt;NA&gt;  &lt;NA&gt;</span></span></code></pre></div>
<p>合并后再拆分</p>
<div class="sourceCode" id="cb43"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb43-1"><a href="#cb43-1"></a>df <span class="op">%&gt;%</span></span>
<span id="cb43-2"><a href="#cb43-2"></a><span class="st">  </span><span class="kw">unite</span>(<span class="st">&quot;xy&quot;</span>, x<span class="op">:</span>y) <span class="op">%&gt;%</span></span>
<span id="cb43-3"><a href="#cb43-3"></a><span class="st">  </span><span class="kw">separate</span>(xy, <span class="kw">c</span>(<span class="st">&quot;x&quot;</span>, <span class="st">&quot;y&quot;</span>))</span>
<span id="cb43-4"><a href="#cb43-4"></a><span class="co">## # A tibble: 4 x 2</span></span>
<span id="cb43-5"><a href="#cb43-5"></a><span class="co">##   x     y    </span></span>
<span id="cb43-6"><a href="#cb43-6"></a><span class="co">##   &lt;chr&gt; &lt;chr&gt;</span></span>
<span id="cb43-7"><a href="#cb43-7"></a><span class="co">## 1 a     b    </span></span>
<span id="cb43-8"><a href="#cb43-8"></a><span class="co">## 2 a     NA   </span></span>
<span id="cb43-9"><a href="#cb43-9"></a><span class="co">## 3 NA    b    </span></span>
<span id="cb43-10"><a href="#cb43-10"></a><span class="co">## 4 NA    NA</span></span></code></pre></div>
</div>
</div>
<div id="缺失值处理" class="section level3">
<h3>缺失值处理</h3>
<p><code>replace_na()</code>用特定值替换缺失值。</p>
<div class="sourceCode" id="cb44"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb44-1"><a href="#cb44-1"></a>df &lt;-<span class="st"> </span><span class="kw">tibble</span>(<span class="dt">x =</span> <span class="kw">c</span>(<span class="dv">1</span>, <span class="dv">2</span>, <span class="ot">NA</span>), <span class="dt">y =</span> <span class="kw">c</span>(<span class="st">&quot;a&quot;</span>, <span class="ot">NA</span>, <span class="st">&quot;b&quot;</span>))</span>
<span id="cb44-2"><a href="#cb44-2"></a>df <span class="op">%&gt;%</span><span class="st"> </span><span class="kw">replace_na</span>(<span class="kw">list</span>(<span class="dt">x =</span> <span class="dv">0</span>, <span class="dt">y =</span> <span class="st">&quot;unknown&quot;</span>))</span>
<span id="cb44-3"><a href="#cb44-3"></a><span class="co">## # A tibble: 3 x 2</span></span>
<span id="cb44-4"><a href="#cb44-4"></a><span class="co">##       x y      </span></span>
<span id="cb44-5"><a href="#cb44-5"></a><span class="co">##   &lt;dbl&gt; &lt;chr&gt;  </span></span>
<span id="cb44-6"><a href="#cb44-6"></a><span class="co">## 1     1 a      </span></span>
<span id="cb44-7"><a href="#cb44-7"></a><span class="co">## 2     2 unknown</span></span>
<span id="cb44-8"><a href="#cb44-8"></a><span class="co">## 3     0 b</span></span></code></pre></div>
<div class="sourceCode" id="cb45"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb45-1"><a href="#cb45-1"></a>df <span class="op">%&gt;%</span><span class="st"> </span>dplyr<span class="op">::</span><span class="kw">mutate</span>(<span class="dt">x =</span> <span class="kw">replace_na</span>(x, <span class="dv">0</span>))</span>
<span id="cb45-2"><a href="#cb45-2"></a><span class="co">## # A tibble: 3 x 2</span></span>
<span id="cb45-3"><a href="#cb45-3"></a><span class="co">##       x y    </span></span>
<span id="cb45-4"><a href="#cb45-4"></a><span class="co">##   &lt;dbl&gt; &lt;chr&gt;</span></span>
<span id="cb45-5"><a href="#cb45-5"></a><span class="co">## 1     1 a    </span></span>
<span id="cb45-6"><a href="#cb45-6"></a><span class="co">## 2     2 &lt;NA&gt; </span></span>
<span id="cb45-7"><a href="#cb45-7"></a><span class="co">## 3     0 b</span></span></code></pre></div>
</div>
</div>
